Techniques for virtual machine transfer and resource management

ABSTRACT

Techniques for transferring virtual machines and resource management in a virtualized computing environment are described. In one embodiment, for example, an apparatus may include at least one memory, at least one processor, and logic for transferring a virtual machine (VM), at least a portion of the logic comprised in hardware coupled to the at least one memory and the at least one processor, the logic to generate a plurality of virtualized capability registers for a virtual device (VDEV) by virtualizing a plurality of device-specific capability registers of a physical device to be virtualized by the VM, the plurality of virtualized capability registers comprising a plurality of device-specific capabilities of the physical device, determine a version of the physical device to support via a virtual machine monitor (VMM), and expose a subset of the virtualized capability registers associated with the version to the VM. Other embodiments are described and claimed.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a national phase claiming the benefit of andpriority to International Patent Application No. PCT/CN2017/079020,entitled “TECHNIQUES FOR VIRTUAL MACHINE TRANSFER AND RESOURCEMANAGEMENT,” filed Mar. 31, 2017, which is hereby incorporated byreference in its entirety.

This application relates, but does not claim priority to, InternationalPatent Application No. PCT/CN2018/080793, filed Mar. 28, 2018. Thecontents of the aforementioned application are incorporated herein byreference.

TECHNICAL FIELD

Embodiments described herein generally relate to information processingand, more specifically, but not exclusively, to techniques formanagement of information processing systems in a virtualizationcomputing environment.

BACKGROUND

Virtualization generally refers to the use of computer software toemulate or otherwise implement the functions or features of a physicalcomputing device to allow, among other things, sharing of physicalcomputing resources. For example, multiple virtual machines (VMs) canrun concurrently on the same physical computing system. In the contextof input-output (I/O) devices, virtualization technologies can be usedto allow multiple VMs to share the same I/O hardware resources. Varioussystems, such as accelerators and high-performance I/O devices, may bevirtualized using single root I/O virtualization (SR-IOV) or otherarchitectures that provide for direct access to I/O devices or functionsthereof from VMs.

A VM typically runs a preconfigured operating system (OS) image thatincludes required applications and/or drivers. In a data centerenvironment, deployment of a VM may include providing a VM image fromremote storage for installation on a physical machine. The data centermay include a large number of physical machines that may includedifferent hardware and/or software elements. For example, certain of thephysical machines may include different generations of an I/O device.The presence of different hardware and/or software elements may lead todeployment compatibility issues when attempting to locate a targetphysical machine that is capable of operating with a particular VM beingdeployed within the data center. Migration of a VM generally involvestransferring a VM from a source physical machine to a target physicalmachine within the data center. VM migration typically requiresmigrating various states of the VM from the source physical machine tothe target physical machine, often over a plurality of phases.Illustrative states may include virtual memory state, virtual centralprocessing unit (CPU) state, and virtual device state. Conventionalmigration techniques, such as within architectures that allow directaccess to I/O devices from VMs, do not fully support complete andefficient migration of the virtual states from the source physicalmachine to the target physical machine. For example, existing VM systemslack driver interfaces for efficiently and accurately migrating devicecapabilities and/or device states.

In conventional virtualization computing environments, management ofresources does not allow for flexible sharing or reassignment of VMprocesses. For instance, if system and/or VM resources are committed toa first function, the system and/or VM resources may not be reassignedto a second function, for example, if the first function is idle or doesnot require the system and/or VM resources. Therefore, VM managementcomponents, such as hypervisors or virtual machine managers (VMMs) arenot able to fully utilize system and/or VM resources.

Accordingly, techniques for comprehensive and efficient virtualizationresource management, deployment, and migration of VMs within a datacenter may be desired.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an embodiment of a first operating environment.

FIG. 2 illustrates an embodiment of a second operating environment.

FIG. 3 illustrates an embodiment of a third operating environment.

FIG. 4 illustrates an embodiment of a fourth operating environment.

FIG. 5 illustrates an embodiment of a fifth operating environment.

FIG. 6 illustrates an embodiment of a sixth operating environment.

FIG. 7 depicts an illustrative logic flow according to a firstembodiment.

FIG. 8 illustrates an example of a storage medium.

FIG. 9 illustrates an example computing platform.

DETAILED DESCRIPTION

Various embodiments may be generally directed toward systems andtechniques for deploying and migrating (“transferring”) domains, such asvirtual machines (VMs) and/or containers, within a virtualized computingenvironment. Various embodiments may also be generally directed towardsystems and techniques for managing resources within a virtualizedcomputing environment. In some embodiments, the virtualized computingenvironment may use a single root input/output virtualization (SR-IOV)architecture. In various embodiments, the virtualized computingenvironment may use a scalable input/output virtualization (S-IOV)architecture. In some embodiments, the S-IOV architecture may employ avirtual device composition module (VDCM) configured to compose a virtualdevice (VDEV) using emulation and/or direct assignment such thatnon-performance critical VDEV state may be implemented in the VDCM.

In some embodiments, a device may expose its device-specificcapabilities through a memory-mapped I/O (MMIO) space enumerated by thedevice driver. The device driver may implement software interfaces forvirtual machine manager (VMM) query support of the device drivercapabilities. Accordingly, a VMM may expose a subset of device specificcapabilities to the VM which may be supported by multiple generations ofan I/O device. In this manner, a data center resource manager (DCRM) maycreate a pool of compatible servers with devices (such as SR-IOVdevices) to which a VM image may be transferred (for instance, deployedor migrated).

In various embodiments, an I/O device may implement device-specificinterfaces for VDEV states, such as performance-critical VDEV states.Non-limiting examples of performance-critical VDEV states may includeassignable interfaces (AIs) (see FIGS. 5 and 6 ), for instance, withinan S-IOV architecture. In some embodiments, the I/O device may implementdevice-specific interfaces for VDEV states to stop/suspend AIs (orcommand interfaces), save AI state, and restore AI state on the targetAI. In some embodiments, if a device supports page request service (PRS)to support shared virtual memory (SVM), virtual central processing units(VCPUs) and virtual functions (VFs) may be suspended without affectingPRS handling. In various embodiments, a host driver may implementsoftware interfaces for the VMM to request such operations (forinstance, suspend, save, and/or restore). Accordingly, a VMM mayefficiently migrate VDEV state during VM migration by saving AI state,migrating AI state alone with non-performance critical states (which maybe managed by VDCM), and restoring VDEV state on a destination AI. Insome embodiments, accessed and dirty (A/D) bit support may beimplemented in an I/O memory management unit (IOMMU), for instance, in asecond-level address translation table (for example, of a CPU or otherprocessing device).

In some embodiments, an apparatus may include at least one memory, atleast one processor, and logic for transferring a virtual machine (VM),at least a portion of the logic comprised in hardware coupled to the atleast one memory and the at least one processor, the logic to generate aplurality of virtualized capability registers for a virtual device(VDEV) by virtualizing a plurality of device-specific capabilityregisters of a physical device to be virtualized by the VM, theplurality of virtualized capability registers comprising a plurality ofdevice-specific capabilities of the physical device, determine a versionof the physical device to support via a virtual machine monitor (VMM),and expose a subset of the virtualized capability registers associatedwith the version to the VM.

In some embodiments, an I/O device and/or VMMs may be configured suchthat the VMMs may emulate to provide resource management functionalitiesthat overcome the deficiencies of conventional SR-IOV systems, such asmapping VDEVs directly to device backend resources (for instance,physical device functions, data resources, or other device resources),emulating MMIO interfaces to implement VMM memory overcommit,dynamically mapping VDEV paths (for instance, fast paths) betweenemulation and direct access to implement device overcommit, and/orcomposing a VDEV from AIs on multiple I/O devices.

In the following description, numerous specific details such asprocessor and system configurations are set forth in order to provide amore thorough understanding of the described embodiments. However, thedescribed embodiments may be practiced without such specific details.Additionally, some well-known structures, circuits, and the like havenot been shown in detail, to avoid unnecessarily obscuring the describedembodiments.

FIG. 1 illustrates an example of an operating environment 100 such asmay be representative of some embodiments. As shown in FIG. 1 ,operating environment 100 may include a VM environment having acomputing device 120, for instance, implemented as a processor-basedplatform configured to execute a VMM 110. Although implemented insoftware, VMM 110 may emulate and export a virtual machine interface tohigher-level software. Such higher-level software may comprise astandard OS, a real-time OS, or a stripped-down environment with limitedOS functionality that, for example, may not include OS facilitiesavailable in a standard OS in some embodiments. Alternatively, forexample, the VMM 110 may be run within, or using the services of,another VMM. VMMs may be implemented, for example, in hardware,software, firmware, and/or any combination thereof. In at least oneembodiment, one or more components of the VMM 110 may execute in one ormore virtual machines and one or more components of the VMM 110 mayexecute on the platform hardware as depicted in FIG. 1 . The componentsof the VMM 110 executing directly on the platform may be referred to ashost components of the VMM 110. In another embodiment, examples of VMM110 may comprise a hybrid virtual machine monitor, a host virtualmachine monitor, or a hypervisor virtual machine monitor.

The computing device 120 may include various logic devices. Non-limitingexamples of computing devices 120 may include a personal computer (PC),a server, a mainframe, a handheld device such as a personal digitalassistant (PDA), a tablet, a smart phone or any other smart devices, anInternet Protocol device, a digital camera, a portable computer, ahandheld PC such as a netbook or notebook, an embedded applicationsdevice such as a micro controller, a digital signal processor (DSP), asystem on a chip (SoC), a network computer (NetPC), a set-top box, anetwork hub, a wide area network (WAN) switch, another processor-basedsystem, and/or any combination thereof. Embodiments are not limited inthis context.

Computing device 120 may include at least a processor 122 and memory126. Processor 122 may be any type of processor capable of executingprograms, such as a microprocessor, digital signal processor,microcontroller, and/or the like. Processor 122 may include microcode,programmable logic or hard coded logic for execution in embodiments.Although FIG. 1 shows only one such processor 122, computing device 120may include a plurality of processors 122. Additionally, processor 122may include multiple cores, support for multiple threads, and/or thelike. Processor 122 may include microcode, programmable logic orhard-coded logic to perform operations associated with variousembodiments described herein.

Memory 126 may comprise a hard disk, a floppy disk, random access memory(RAM), read only memory (ROM), flash memory, any other type of volatilememory devices or non-volatile memory devices, or combination of theabove devices, or any other type of machine medium readable by processor122 in various embodiments. Memory 126 may store instructions and/ordata for performing program execution and other method embodiments. Insome embodiments, some elements of the disclosure may be implemented inother system components, for example, in the platform chipset or in thesystem's one or more memory controllers.

VMM 110 may present to guest software an abstraction of one or more VMs104 a-n. VMM 110 may present the same or different abstractions ofphysical platform to different VMs 104 a-n. Guest software, such asguest software running on each of VM 104 a-n, may include a guest OSsuch as a guest OSs 108 a-n and various guest software applications 106a-n. Guest software applications 106 a-n may access physical resources(for instance, processor registers, memory, and I/O devices) within theVMs 104 a-n on which guest software applications 106 a-n are running andmay perform various other functions. For example, guest softwareapplications 106 a-n may have access to all registers, caches,structures, I/O devices, memory and/or the like, according to thearchitecture of the processor and platform presented in VMs 104 a-n.

In one embodiment, processor 122 may control the operation of VMs 104a-n. In one embodiment, in response to a VM 104 a-n referencing a memorylocation in its virtual address space, a reference to an actual addressin the physical memory of the host machine 120 (machine physical memory)may be generated by memory management module (not shown) in VMM 110,which may be implemented in hardware (sometimes incorporated intoprocessor 122) and software (for example, in the operating system of thehost machine). In the embodiment of FIG. 1 , computing device 120 mayinclude one or more I/O devices 124. Computing device 120 may includeone or more graphic control devices 128 that may be used to perform oneor more graphics functions.

FIG. 2 illustrates an example of an operating environment 200 such asmay be representative of some embodiments. More specifically, operatingenvironment 200 may include a VM environment implemented using an SR-IOVarchitecture. As shown in FIG. 2 , operating environment 200 may includea host 205, VMs 210 a-n, a VMM 220, and an SR-IOV device 230. Each ofVMs 210 a-n may include one or more VF drivers 212 a-212 n that interactwith respective VFs 232 a-232 c that are implemented, for instance, inSR-IOV device 230. In various embodiments, VFs 232 a-n may include VFbase address registers (BARs) 260 a-n, for example VF BAR MMIO spaces,and/or VF configuration spaces 262 a-n, for example, PCIe configurationspaces. A physical function (PF) may be implemented using PF drivers 222operating in host 205 (or in VMM 220). PF driver 222 may include PF BARs250 and PF configuration space 252. In various embodiments, SR-IOVdevice 230 may implement device resource remapping logic 270 to map eachVF 232 a-n to backend resources 272 a-n (for instance, queues, contexts,etc.).

Various systems, such as accelerators and high-performance I/O devices,may be virtualized using an SR-IOV architecture, such as thearchitecture depicted in FIG. 2 . In general, SR-IOV may specify thatthe physical device may include a single PF 240 and multiple VFs 232a-n, such that each VF 232 a-n can be assigned to a VM 210 a-n directly.A VF 232 a-n may appear as a virtual I/O device to the VM 210 a-n. VFdriver 212 a-n in VM 210 a-n may directly access VF 232 a-n MMIO spacewithout VMM 220 involvement. VF drivers 212 a-n may directly submit workto respective VFs 232 a-n and the device may process the work fromvarious VFs 232 a-n (or VMs 210 a-n) while providing isolation among thevarious VFs 232 a-n. For instance, a VF driver 212 a-n may see a real VF232 a-n and all of the VFs 232 a-n capabilities. VF drivers 212 a-n mayinteract with VFs 232 a-n directly by using a CPU second level pagetable (for example, an EPT, EPT 224 a-n, and/or the like).

A VM typically runs a preconfigured OS image with all required packages,applications, device drivers (including drivers for assigned devices andSR-IOV VFs), and/or the like. In a data center environment, deploying(or starting) a VM may include downloading a VM from a remote storagelocation to a physical machine, for instance, using a data centerresource manager (DCRM). The DCRM may be responsible for identifying atarget physical machine when deploying the VM and/or migrating the VM.In the data center, there may be a large number of physical machineswith certain types of SR-IOV devices and, therefore, a large number oftarget machines for a VM image. However, SR-IOV devices on such systemsmay not be identical versions, for instance, as data center hardware isregularly updated, replaced, and/or the like. Accordingly, for each typeof SR-IOV device, target systems could have multiple generations of theI/O device (for instance, certain machines with a newer version of theI/O device, while others may have older generations). In addition,different generations of I/O devices may not be fully compatible witheach other. For instance, a newer generation I/O device may contain newfeatures that an older generation I/O device may not have.

The presence of multiple generations of I/O devices in the data centercoupled with current SR-IOV virtualization architecture (for example,where guest OSs directly access assigned devices or VFs without any VMMinvolvement) may lead to migration and/or deployment compatibilityissues within the data center environment. For example, it may bechallenging if not impossible for a DCRM to select a target physicalmachine for deploying or migrating a VM with an assigned SR-IOV deviceVF. Since the VF driver sees the entire VF MMIO space and can access allits capabilities directly without any VMM involvement, the VF drivercontained in the VM image must be compatible with the SR-IOV devicepresent on the target system. Such requirements may mean that the VMimage (for instance, containing the VF driver) may not work with themultiple generations of the SR-IOV device that exist in the data center.The DCRM may be challenged because the DCRM must find a target systemwith a compatible generation SR-IOV device for the VM image whenstarting or migrating the VM. Accordingly, even if there are availableSR-IOV devices in the data center, the VM may not be able to use thembecause the VF driver may not be compatible with the SR-IOV VF. Suchconditions may occur frequently in conventional data centers wheremultiple generations of the same SR-IOV device may be present ondifferent systems.

Such multiple-generations issues may also occur with CPUs as variousgenerations of CPUs may be present in a conventional data center.However, CPUs may include a CPU identification (CPUID) mechanism forsoftware to find out the capabilities of the CPU. VMMs may use CPUIDvirtualization to enumerate and hide incompatible capabilities of newerCPUs to the VMs. In this manner, VMMs may create pools of compatibleservers where the same VM image may be downloaded and run. However,conventional I/O devices lack an identifier functionality that isanalogous to the CPU CPUID functionality.

In conventional SR-IOV architectures, the VMM does not control the VDEVseen by the VF driver. Each VF looks like a whole device to the VM andthe VF driver directly accesses the VF (for instance, the entire MMIOspace and all of its capabilities) without involving the VMM.Accordingly, the VDEV in the VM is composed of an entire VF in theSR-IOV device. Therefore, the VMM cannot hide or expose VF capabilitiesto the VF driver.

VMs may be migrated in a live or active state from a source physicalmachine to a target physical machine. Live VM migration requiresmigrating the memory state from the source physical machine to thetarget physical machine. In the first phase, the VMM migrates the entirememory to the target physical device while the VM is still running. Insubsequent phases, the VMM migrates only the memory pages modified sincethe previous phase. The memory pages may be modified by the CPU and/orI/O devices. The VMM can track the CPU modified (or “dirty”) pages, forexample, through the A/D (access/dirty) bit support in extended pagetables (EPTs). However, if the device (or the VF) is directly assignedto the VM, the VMM cannot determine the pages modified by the device (orthe VF) because the VM directly submits memory commands (for example,direct memory accesses (DMAs)) to the device without VMM involvement.Accordingly, live migration is challenging, particularly if a VM has anassigned VF. Furthermore, conventional VMMs may not support VM migrationif a VM has an assigned VF. A main issue with live memory migration ofVMs with an assigned VF is that there is not A/D bit support in theIOMMU second-level page tables. Accordingly, there is no way for the VMMto detect VM pages modified by the assigned VFs.

In general, VM migration requires a VMM to migrate various VM statesfrom a source physical machine to a target physical machine.Non-limiting examples of VM states may include a virtual (or VM) memorystate, a virtual CPU (VCPU) state, and a virtual device state. If the VMhas an assigned SR-IOV VF, the virtual device state may also include aVF state. However, since the VF driver accesses the VF MMIO without anyVMM involvement, it is challenging for the VMM to migrate the VF state.For instance, an SR-IOV device may implement a PF driver interface tostop/suspend a VF, store the VF state on the source machine, and restorethe VF state on the target machine. However, implementing processes forperforming such VF state storage/restoration on a conventional SR-IOVdevice is complex, costly, and inefficient because a VF has significantstate information on the SR-IOV device and the VF driver directlyaccesses the VF MMIO space without any VMM involvement.

Certain SR-IOV devices may support page request service (PRS), forinstance, to support shared virtual memory (SVM). Suspension of VFs andVCPUs is generally more complex in a PRS-based system because PRSrequests require the VM's VCPUs to be running. If a VCPU is running, theVCPU may continue to submit I/O commands to the device, potentiallygenerating more PRS requests. Accordingly, conventional systems do notinclude a PF driver interface to stop/suspend a VF and save/restore a VFstate. As a consequence, VM migration is typically not supported if theVM has assigned VFs (or PFs).

In addition, although SR-IOV allows an I/O device to be shared amongmultiple VMs, conventional SR-IOV architectures experience significantresource management limitations when virtualizing I/O devices. Forinstance, VM memory overcommit and VF overcommit conditions aredifficult if not impossible to implement using conventional SR-IOVarchitecture and/or software. The implementation challenges arise from,among other things, because each VF implements an entire MMIO space of aVDEV and the VF driver (in the VM) directly accesses VF MMIO spacewithout involving the VMM. Accordingly, the VDEV MMIO space in the VMmay be composed entirely of VF MMIO space in the I/O device.

In some embodiments, a virtual environment may include an S-IOVarchitecture, for instance, where a virtual device composition module(VDCM) composes a VDEV using both emulation and direct assignment suchthat non-performance critical VDEV state(s) may be implemented in theVDCM. In addition, SR-IOV devices typically implement some, all, orsubstantially all I/O virtualization resource management processes inhardware. Consequently, SR-IOV software architecture implements limitedresource management capabilities. However, an S-IOV architecture mayimplement various processes (including resource management) in softwareinstead of hardware, such as, for instance, non-performance criticalprocesses. In this manner, a VMM may have more opportunities toimplement drastically improved resource management capabilities in hostsoftware.

FIG. 3 illustrates an example of an operating environment 300 such asmay be representative of some embodiments. More specifically, operatingenvironment 300 may include a VM environment implemented using an S-IOVarchitecture. As shown in FIG. 3 , operating environment 300 may includea system 305 configured as an information processing system for scalablevirtualization of I/O devices (S-IOV). System 305 may represent any typeof information processing system, such as a server, a desktop computer,a portable computer, a set-top box, a handheld device such as a tabletor a smart phone, or an embedded control system. System 305 may includea processor 312, a memory controller 314, a host fabric controller 316,an I/O controller 340, an I/O memory management unit (IOMMU) 142, asystem memory 320, a graphics processor 330, and/or a hardwareaccelerator 350. System 305 may include any number of each of thesecomponents and any other components or other elements, such asadditional peripherals and/or I/O devices. Any or all of the componentsor other elements in this or any system embodiment may be connected,coupled, or otherwise in communication with each other through anynumber of buses, point-to-point, or other wired or wireless interfacesor connections, unless specified otherwise. Any components or otherportions of system 305, whether shown in FIG. 3 or not shown in FIG. 3 ,may be integrated or otherwise included on or in a single chip (asystem-on-a-chip or SOC), die, substrate, or package, such as SOC 110.

System memory 320 may include, for example, dynamic random access memory(DRAM) or any other type of medium readable by processor 312. Memorycontroller 314 may represent any circuitry or component for accessing,maintaining, and/or otherwise controlling system memory 320. Host fabriccontroller 316 may represent any circuitry or component for controllingan interconnect network or fabric through which processors and/or othersystem components may communicate. Graphics processor 330 may includeany processor or other component for processing graphics data fordisplay 332. Hardware accelerator 350 may represent any cryptographic,compression, or other accelerator to which a processor may offloadfunctionality such as the hardware acceleration of encryption orcompression algorithms. I/O controller 340 may represent any circuitryor component, such as a chipset component, including or through whichperipheral, input/output (I/O), or other components or devices, such asI/O device 344 (e.g., a touchscreen, keyboard, microphone, speaker,other audio device, camera, video or other media device, motion or othersensor, receiver for global positioning or other information, etc.), NIC346, and/or information storage device 348, may be connected or coupledto processor 312. Information storage device 348 may represent any oneor more components including any one or more types of persistent ornon-volatile memory or storage, such as a flash memory and/or a solidstate, magnetic, or optical disk drive, and may include its own storagedevice controller 349.

Processor 312 may represent all or part of a hardware componentincluding one or more processors or processor cores integrated on asingle substrate or packaged within a single package, each of which mayinclude multiple execution threads and/or multiple execution cores, inany combination. Each processor represented as or in processor 312 maybe any type of processor, including a general purpose microprocessor,such as a processor in the Intel® Core® Processor Family or otherprocessor family from Intel® Corporation or another company, a specialpurpose processor or microcontroller, or any other device or componentin an information processing system in which an embodiment of thepresent specification may be implemented.

Processor 312 may be architected and designed to operate according toany instruction set architecture (ISA), with or without being controlledby microcode. Processor 312 may support virtualization according to anyapproach. For example, processor 312 may operate in the following twomodes: a first mode in which software runs directly on the hardware,outside of any virtualization environment, and a second mode in whichsoftware runs at its intended privilege level, but within a virtualenvironment hosted by a VMM running in the first mode. In the virtualenvironment, certain events, operations, and situations, such asinterrupts, exceptions, and attempts to access privileged registers orresources, may be intercepted, for instance, to cause the processor toexit the virtual environment (a VM exit) so that the VMM may operate,for example, to implement virtualization policies. The processor maysupport instructions for establishing, entering (a VM entry), exiting,and/or maintaining a virtual environment, and may include register bitsor other structures that indicate or control virtualization capabilitiesof the processor.

FIG. 4 illustrates an example of an operating environment 400 such asmay be representative of some embodiments. More specifically, operatingenvironment 400 may include a VM environment implemented using an S-IOVarchitecture. FIG. 4 illustrates processor 405, which may represent anembodiment of processor 312 in FIG. 3 or an execution core of amulticore processor embodiment of processor 312 in FIG. 3 . Processor405 may include a storage unit 410, an instruction unit 420, anexecution unit 430, a control unit 440, and/or a memory management unit(MMU) 450. Processor 405 may also include any other circuitry,structures, or logic not shown in FIG. 3 .

Storage unit 410 may include any combination of any type of storageusable for any purpose within processor 405. For example, storage unit410 may include any number of readable, writable, and/or read-writableregisters, buffers, and/or caches, implemented using any memory orstorage technology, in which to store capability information,configuration information, control information, status information,performance information, instructions, data, and any other informationusable in the operation of processor 405, as well as circuitry usable toaccess such storage and/or to cause or support various operations and/orconfigurations associated with access to such storage.

Instruction unit 420 may include any circuitry, logic, structures,and/or other hardware, such as an instruction decoder, to fetch,receive, decode, interpret, schedule, and/or handle instructions to beexecuted by processor 405. Any instruction format may be used within thescope of the present disclosure; for example, an instruction may includean opcode and one or more operands, where the opcode may be decoded intoone or more micro-instructions or micro-operations for execution byexecution unit 430. Operands or other parameters may be associated withan instruction implicitly, directly, indirectly, or according to anyother approach.

Execution unit 430 may include any circuitry, logic, structures, and/orother hardware, such as arithmetic units, logic units, floating pointunits, shifters, etc., to process data and execute instructions,micro-instructions, and/or micro-operations. Execution unit 430 mayrepresent any one or more physically or logically distinct executionunits. Control unit 440 may include any microcode, firmware, circuitry,logic, structures, and/or hardware to control the operation of the unitsand other elements of processor 405 and the transfer of data within,into, and out of processor 405. Control unit 440 may cause processor 405to perform or participate in the performance of processes according tosome embodiment, for example, by causing processor 405, using executionunit 430 and/or any other resources, to execute instructions received byinstruction unit 420 and micro-instructions or micro-operations derivedfrom instructions received by instruction unit 420. The execution ofinstructions by execution unit 430 may vary based on control and/orconfiguration information stored in storage unit 410.

MMU 450 may include any circuitry, logic, structures, and/or otherhardware to manage system memory, such as providing for thevirtualization of physical memory according to any desired approach andthe protection of system memory. In an embodiment, MMU 450 may supportthe use of virtual memory to provide software, including softwarerunning in a VM, with an address space for storing and accessing codeand data that is larger than the address space of the physical memory inthe system, for instance, system memory 320. The virtual memory space ofprocessor 405 may be limited only by the number of address bitsavailable to software running on the processor, while the physicalmemory space of processor 405 may be limited to the size of systemmemory 320. MMU 450 supports a memory management scheme, such as paging,to swap the executing software's code and data in and out of systemmemory 320 on an as-needed basis. As part of this scheme, the softwaremay access the virtual memory space of the processor with anun-translated address that is translated by the processor to atranslated address that the processor may use to access the physicalmemory space of the processor.

Accordingly, MMU 450 may include translation lookaside buffer 452 inwhich to store translations of a virtual, logical, linear, or otherun-translated address to a physical or other translated address,according to any known memory management technique, such as paging. Toperform these address translations, MMU 450 may include page-walkhardware 454 to refer to one or more data structures stored in processor405, system memory 320, storage locations in system 305 not shown inFIG. 3 , and/or any combination thereof. These data structures mayinclude page directories, page tables, and other paging data structuresaccording to any known paging architecture. Each such paging datastructure, as well as TLB 452, may include (or have associated withindividual or groups of entries) one or more bits or other indicators tobe used to indicate and enforce various permissions (e.g., read, write,or execute) that may define or restrict access to pages (or otherregions) of memory.

The virtualization capabilities of a processor along with MMU 450 mayprovide for various approaches to creating and maintaining containers,where a container may be any execution or processing environment,created and maintained by a hypervisor, VMM, OS, or any other system orhost software. Any platform, system, or machine, including the “baremetal” platform shown as system 305 in FIG. 3 , as well as any VM orother container abstracted from a bare metal platform, from which one ormore containers are abstracted may be referred to as a host or hostmachine, and each VM or other such container abstracted from a hostmachine may be referred to as a guest or guest machine. Accordingly, theterm “host software” may generally refer to any hypervisor, VMM, OS, orany other software that may run, execute, or otherwise operate on a hostmachine and create, maintain, and/or otherwise manage one or morecontainers. The term “guest software” may generally refer to any OS,system, application, user, or other software that may run, execute, orotherwise operate on a guest machine. Note that in a layered containerarchitecture, software may be both host software and guest software. Forexample, a first VMM running on a bare metal platform may create a firstVM, in which a second VMM may run and create a second VM abstracted fromthe first VM, in which case the second VMM is both host software andguest software.

For convenience, the use of the term “container process” may mean anycontext, task, application, software, privileged process, unprivilegedprocess, kernel-mode process, supervisor-mode process, user-modeprocess, or any other process running or runnable within a container. Acontainer may have an address space (a container address space or aguest address space) that is different from the system address space(for example, the address space of system memory 320) or the hostaddress space (for example, the address space of the host machine). Anaddress with which the system address space may be directly accessed(for instance, without translation) may be referred to as a hostphysical address (HPA). For isolation, protection, or any other purpose,any container address space may be different from any other containeraddress space. Therefore, each container process may access memory usingaddresses that are to be translated, filtered, or otherwise processed toHPAs differently than they are translated, filtered, or otherwiseprocessed for any other container. The difference intranslation/processing of container addresses may be due tovirtualization and isolation of container address spaces (e.g., guestsoftware may use guest virtual addresses (GVAs) that are translated toguest physical address spaces (GPAs) that are translated to HPAs) andmay also be due to the use of a variety of different types of containers(e.g., VMs, OS-managed containers, etc.) and/or different containerarchitectures (e.g., layered architectures including VMs hostingmultiple VMs, VMs hosting multiple OS-managed containers, etc.).

An address used by a container process to access memory (a containeraddress) may be any of many different types of addresses, including anHPA, a virtual address, a guest physical address (GPA), a guest virtualaddress (GVA), a DMA address, etc., and may go through one or more ofany of a variety of techniques, types, levels, layers, rounds, and/orsteps of translation, filtering, and/or processing, in any combination,using any of a variety of data structures (e.g., page tables, extendedpage tables, nested page tables, DMA translation tables, memory accessfilters, memory type filters, memory permission filters, etc.) to resultin an HPA and/or in a fault, error, or any other type of determinationthat a requested access is not allowed. Various approaches may includelayering and/or nesting of containers (e.g., a VMM hosting a VM runninga guest OS, the guest OS supporting multiple containers; a VMM hostingmultiple VMs each running a guest OS, etc.), involving variouscombinations of address translation techniques.

Each PF within an I/O device in system 305 may become usable and/orshareable by one or more clients (for example, containers, containerprocesses, host processes, and/or the like) by reporting to systemsoftware the number of command interfaces (“assignable interfaces” (AIs)or “command portals”) that it supports, where a command interface is aninterface through which a client (for example, a VM or VMM) maycommunicate with an I/O device. In various embodiments, a client may usea command interface to submit a work request to the I/O device (forexample, through a portal driver). In an embodiment in which thevirtualized computing device operates in an S-IOV architecture, thecommand interface may include an AI. For instance, an AI for a NIC (forexample, MC 346) may be a paired transmit queue and receive queue. An AIfor an InfiniBand, remote DMA (RDMA), or other host fabric controller(for example, host fabric controller 316) may be a Queue Pair. An AI fora Non-Volatile Memory Express (NVMe) or other storage device controller(for example, storage device controller 349) may be a Command Queue. AnAI for a graphics processing unit (GPU), general purpose computing onGPU (GPGPU), or other accelerator (for example, hardware accelerator350) may be a schedulable context through which work may be submitted.An AI may be distinguished from an “admin portal” as being an interfacefor a client to submit work, whereas an admin portal is an interfacethrough which a container host sets up or configures the AIs.

An I/O device may report to host software that it supports one or moreAIs for use according to embodiments of the present specification, aswell as how many AIs it supports, through capability/attributeinformation that it provides according to a system bus or interconnectspecification (for example, through a new capability added to thePeripheral Component Interconnect Express (PCIe) specification), by adevice driver for the physical function, or according to any other knowntechnique for reporting physical function capabilities/attributes.

In some embodiments, host software may use the I/O device's admin portalto allocate, map, and/or assign each AI to a client. This assignmentincludes assigning a process address space identifier (PASID) to the AI,where the PASID corresponds to the address space associated with theclient. In an embodiment, a PASID may be a 20-bit tag defined by thePCIe specification and carried by the translation layer packet (TLP)prefix header in transactions generated by the I/O device. After theassignment and configuration of an AI has been completed, clients maysubmit work requests to the AI according to some embodiments.

In some embodiments, an S-IOV architecture, such as depicted in FIGS. 3and 4 , may include an I/O device configured to implement an AI as alight-weight version of a VF. AIs may lack certain functionality orelements associated with a VF. For instance, an AI may lack a PCIconfiguration space, VF base address registers (BARs), and/or an MSI-Xtable. In various embodiments, an AI may implement one or more pages ofMMIO registers (for example, 4 KB pages of MMIO registers) that are partof the main device (PF) BARs. In some embodiments, each AI maycorrespond to an individual resource (for instance, queue, context,and/or the like) and may implement the minimal MMIO interface toconfigure and operate the respective backend resource. AI access from aguest driver may include various types or categories of accesses.Non-limiting examples of AI accesses may include control path accesses(generally infrequent accesses and, therefore, not performance critical)and fast path accesses (frequent data path accesses and, therefore,performance critical). An AI control path and fast path MMIO registersmay be configured in different memory pages (for example, different 4 KBpages) so that fast path registers may be mapped into the VM for directaccess while control path registers may be emulated in software.

In conventional virtualized computing environments, CPU elements (the“CPU side”) may have certain support functions and/or elements tofacilitate migration and/or deployment that is lacking for devices (the“device side”). For example, to facilitate migration and/or deploymentcompatibility, the CPU side may use CPUID and/or CPUID virtualization toenumerate and hide CPU capabilities exposed to VMs. Such functionalitymay allow a VMM to create a pool of live migration compatible severs fora VM. However, no such architecture or functionality exists on thedevice side to enumerate device capabilities and to expose subsets toVMs. Accordingly, device compatibility issues exist for some VF driversto work across different generations of devices. In another example, tofacilitate memory state migration, the CPU side of conventional systemsmay include A/D bit support in EPT page-tables to, for instance, enablea VMM to track and/or migrate only modified pages during phases of VMmemory migration. Nonetheless, conventional systems lack A/D support onthe device side, for instance, A/D support in IOMMU second-level pagetables. Accordingly, VM pages modified by assigned VFs cannot bedetected. In a further example, to facilitate device state migration,the CPU side of conventional systems may include functionality for a VMMto stop and/or suspend a VCPU, save the VCPU state, and restore the VCPUstate on a physical CPU of a target physical machine. However, thedevice side of conventional systems does not include correspondingfunctionality, for instance, PF driver interfaces configured to stopand/or suspend a VF, save VF states, and restore VF states on a targetphysical machine. Accordingly, VMMs operating in conventional systemstypically disable live migration, for instance, of VM assigned devices.

As described above, devices, such as SR-IOV devices, may be virtualizedby the VMM assigning VFs directly to the VMs. A VF PCI configuration istypically virtualized by the VMM, but the MMIO BARs are mapped directlyinto the VM. Although the VMM may virtualize (for instance, hide orexpose) PCI capabilities, the VMM cannot hide or expose VFdevice-specific capabilities (for example, exposed through the MMIOspace) to the VM. Accordingly, deployment and/or migration issues arisebetween the VF driver and multiple generations of SR-IOV devices in thedata center. The VF driver may directly access and program correspondingVF MMIO space without any further VMM involvement. Accordingly, it ismore complex to migrate the VDEV state to a target VF during live VMmigration since the VF contains all of the VDEV MMIO states. Inaddition, if a device supports PRS and/or SVM, there may be additionalstates in the VF that need to be migrated, adding to the existingcomplexity of migration. Certain conventional techniques, for instance,involving SR-IOV MC devices, involve the VMM removing a NIC VF from theVM and switching the VM MC to emulation on a source physical machine(and performing the reverse operation on the target physical machine).However, such techniques depend on packets that are lost duringmigration being treated as naturally lost packets that may be recovered,for instance, by software. However, these techniques are not generallyapplicable to a wide range of devices and, in particular, do not workfor devices that lack software for handling lost packets.

Accordingly, some embodiments provide for VM deployment and/or migrationcompatibility processes generally applicable to various devices. Invarious embodiments, VM and/or VDEV deployment and/or migration may beachieved using a virtual environment in which a device may expose itsdevice-specific capabilities. In some embodiments, the virtualenvironment may include an SR-IOV architecture, an S-IOV architecture, acombination thereof, and/or the like. In various embodiments, a devicemay expose its device specific capabilities through the MMIO space, forinstance, that may be enumerated by a driver. In some embodiments, thedriver may implement software interfaces to support VMM querying of thedevice-specific capabilities. Accordingly, a VMM may expose only asubset of device-specific capabilities to the VM which may be supportedby multiple generations of an I/O device. In this manner, a DCRM orsimilar element may operate to create a pool of compatible servers withdevices (such as S-IOV devices) in which a VM image may be deployedand/or migrated.

In various embodiments, for performance-critical device states (forinstance, VDEV states) an I/O device may implement device-specificinterfaces to stop and/or suspend various elements and/or processes suchas VFs and/or AIs. For example, for performance-critical VDEV states(for instance, an AI state that is in an I/O device), the I/O device mayimplement device-specific interfaces to stop/suspend the AIs, save AIstates, and restore AI states on a target AI, device, and/or the like.In a computing environment in which a device supports PRS, for example,to facilitate SVM, some embodiments may operate to suspend VCPUs and/orVFs without affecting PRS handling. In some embodiments, a host drivermay be configured to implement software interfaces for a VMM to requestthese operations (for instance, suspend, save, restore, and/or thelike). Therefore, systems operating according to some embodiments mayefficiently migrate VDEV state during VM migration, for instance, bysaving AI states, migrating AI states along with non-performancecritical states (which may already be managed by a VDCM), and restoringVDEV state on the destination AIs. In various embodiments, A/D bitsupport may be employed for migrating VM memory state, for example, inthe IOMMU second-level address translation table.

Accordingly, some embodiments may facilitate compatibility duringdeployment and/or migration of isolated domains (for instance, VMs,containers, and/or the like), such as via locating a compatible physicalmachine on which to start the domain and/or migrate a running domain ina data center environment. In addition, various embodiments may beconfigured to efficiently and accurately migrate domain memory stateduring live domain migration. Furthermore, some embodiments may beconfigured to efficiently and accurately migrate device state duringdomain migration, for example, where the domain has direct access toaccelerators and/or high-performance I/O devices, such as GPU, FPGA,Cryptographic/Compression accelerators, Omnipath NICs, 100G EthernetNICs/RDMA, and/or the like.

Although SR-IOV and S-IOV systems may be used in the described examples,embodiments are not so limited, as any type of virtualized computingenvironment capable of operating according to some embodiments iscontemplated herein. In addition, embodiments are not limited tooperating using VMs, VFs, AIs, containers, and/or the like described invarious examples, as any type of domain or virtualized element capableof operating according to some embodiments is contemplated herein.Embodiments are not limited in this context.

FIG. 5 illustrates an example of an operating environment 500 such asmay be representative of some embodiments. More specifically, operatingenvironment 500 may include a virtualized computing environmentimplemented according to some embodiments, such as one of operatingenvironments 100, 200, 300, and/or 400. In some embodiments, operatingenvironment 500 may include an S-IOV architecture. In some embodiments,operating environment 500 may include a type-1 VMM architecture.

As shown in FIG. 5 , operating environment 500 may include a host OS505, a VM 515 having a guest OS 525, a VMM 530, an IOMMU 540, and/or aphysical device 550. Host OS 505 may include a host driver 508 and guestOS 525 may include a guest driver 510. In some embodiments, host driver508 may operate similar to an SR-IOV PF driver, and guest driver 510 mayoperate similar to an SR-IOV VF driver. In various embodiments, host OS505 may include a VDCM 502 for a VDEV 504. In some embodiments, VDEV mayinclude virtual capability registers 506 configured to expose device (or“device-specific”) capabilities to one or more components of operatingenvironment 500. In various embodiments, virtual capability registers506 may be accessed by guest driver 510 of a physical device 550 todetermine device capabilities associated with VDEV 504. In someembodiments, one or more AIs 564 a-n may be assigned, for instance, bymapping the AIs 564 a-n into a VDEV MMIO space. Operating environment500 may include an IOMMU 540 configured, for instance, to provide A/Dbit support according to some embodiments. As shown in FIG. 5 , a (slow)control path 522 may be arranged between guest driver 510 and virtualcapability registers 506 and a (fast) data path 520 may be arrangedbetween guest driver 510 and PF 560 operative on physical device 550.

In various embodiments, VMM 530 may determine a compatibility betweenguest driver 510 in a VM image and a physical device 550 (for instance,a target I/O device). In some embodiments, physical device 550 mayimplement capability registers 562, for instance, in a PF 560. Invarious embodiments, capability registers 562 may include MMIOcapability registers. In some embodiments, capability registers 562 maybe configured to expose device-specific capabilities of physical device550 to host driver 508. In some embodiments, VDCM 502 may virtualizecapability registers 562 in VDEV 504 as virtual capability registers 506configured to expose VDEV capabilities to guest driver 510. Accordingly,VMM 530 may control which device-specific capabilities of capabilityregisters 562 may be exposed to the VM 515 (for instance, guest driver510). In this manner, VMM 530 may determine a subset of device-specificcapabilities (a “capability subset”) depending on, for instance, whichgenerations of physical devices VMM is supporting. For example, if VMM530 is supporting generations N−1, N, and N+1 of I/O devices in a datacenter, VMM may select to expose a capability subset comprisingdevice-specific capabilities that are present on the three generationsof I/O devices (for instance, N−1, N, and N+1). Consequently, the sameVM image (for instance, with guest driver 510) may be deployed on and/ormigrated to any target physical machine containing any three of the I/Odevice generations (for instance, N−1, N, and N+1).

In some embodiments, VDCM 502 may emulate non-performance criticalcapabilities. For example, VDCM 502 may emulate non-performance criticalcapabilities in software for certain generations of I/O devices havingcapabilities that are not implemented in hardware. Accordingly,device-specific capabilities in a capability subset may be increased toinclude additional values delivered to guest VMs without affecting theirperformance. In this manner, a VMM 530 configured according to someembodiments may enumerate and/or hide incompatible device-specificcapabilities of multiple generations of I/O devices to VM 515. Invarious embodiments, guest driver 510 may query for the capabilities ofphysical device 550, for instance, located in capability registers 562.In some embodiments, guest driver 510 may query for the capabilities ofphysical device 550 via VDCM. In some embodiments, guest driver 510 mayquery for the capabilities of physical device 550 after VM migration.Accordingly, guest driver 510 may adapt to changes in device-specificcapabilities of physical device 550.

In various embodiments, VM memory migration may be handled via A/D bits542 in IOMMU 540. In some embodiments, A/D bit support may be associatedwith IOMMU 540 second-level address translation tables (for instance,DMA remapping page tables). In various embodiments, A/D bits 542 may beupdated automatically by DMA remapping hardware page-walks, for example,allowing CPU MMUs and/or DMA remapping hardware to share a common set ofpage-tables. In this manner, VMM 530 may use A/D bits 542 invirtualization page-tables to detect dirtied pages due to writes fromboth guest OS 525 software and/or assigned devices, such as VFs and/orAIs.

To implement VM migration supporting migration of VDEV state, a physicaldevice 550 configured according to some embodiments may implementinterfaces for VMM 530 to request various migration processes and/orprocess steps. Non-limiting examples of such migration process steps mayinclude, requesting AI suspend, waiting for AI suspend complete,requesting AI restore, and/or waiting for AI restore complete. In someembodiments, VMM 530 may provide an AI (or command interface) suspendrequest message or signal and/or an AI restore request message orsignal. In various embodiments, VMM 530 may receive an AI suspendcomplete message or signal from physical device 550 responsive to an AIbeing suspended (for instance, an AI specified in the AI suspend requestmessage). In some embodiments, VMM 530 may receive an AI restorecomplete message from physical device responsive to completion of an AIrestore process (for instance, of an AI specified in an AI restoremessage).

In some embodiments, physical device 550 may generate a migration statuselement that includes migration state information associated with VM515. In some embodiments, a migration status element may include an AImigration state that includes, for instance, all of the AI stateinformation required to resume a new AI to the same operational state asthe original AI 564 a-n. In some embodiments, host driver 508 may saveand restore the AI migration state to/from the AI 564 a-n. In someembodiments, physical device 550 and host driver 508 may implementinterfaces using various processes and/or elements, including, withoutlimitation, memory registers (for instance, MMIO registers), I/Ocommands, host memory, and/or the like.

Various embodiments may be configured to handle outstanding I/O commandsduring VM deployment and/or migration. For instance, during the lastphase of live migration, when VM 515 is suspended, VM 515 may haveoutstanding I/O commands in physical device 550. Handling of suchoutstanding I/O commands may be device specific and may be abstractedfrom VMM 530 using various interfaces, such as an AI suspend interface,an AI migration state interface, and/or an AI restore interface. Forexample, in some embodiments, physical device 550 may wait for alloutstanding I/O commands from VM 515 to complete during an AI suspendprocess. Accordingly, an AI migration state process may not containoutstanding I/O commands. In this manner, an AI restore processassociated with VM 515 may be simpler and more efficient. In anotherexample, physical device 550 may wait to complete only outstanding I/Ocommands that are already in a processing pipeline of physical device550 during an AI suspend process. For instance, physical device 550 mayinclude the queued I/O commands (for example, that are still waiting tobe processed) in an AI (or command interface) migration state process orelement (for example, an information element). Accordingly, host driver508 may save the outstanding I/O commands as part of an AI migrationstate process or element and migrate them to the target physicalmachine. In some embodiments, an AI restore process on the targetphysical device may re-issue the outstanding I/O commands in the AImigration state process or element to the newly assigned AIs 564 a-nsuch that the outstanding I/O commands may be processed on the targetphysical device post-migration. In this manner, an AI suspend processmay finish faster, thereby reducing downtime for VM 515 during a livemigration process.

In various embodiments, physical device 550 may support PRS and/or SVMand, therefore, physical device 550 may generate PRS requests duringprocessing of I/O commands. In a conventional system, if a page fault isin the first-level page table, PRS handling may need to be handled bythe VM. Since a VMM stops the VM (for instance, VCPUs) before issuing anAI suspend message, any PRS requests during processing of outstandingI/O commands that require handling by the guest VM cannot be processed.If the VMM resumes the VCPUs to handle the PRS, the VM may submit moreoutstanding I/O commands which in turn may generate more PRS requests.Accordingly, in a conventional system, AI suspend processes may takearbitrarily long to complete and, therefore, essentially prevent live VMmigration.

Accordingly, in some embodiments in which physical device 550 supportsPSR and/or SVM, when physical device 550 encounters an I/O page faultand needs to perform PRS for an AI in an AI suspend process, physicaldevice 550 may save the execution state of the corresponding I/O commandas part of an AI migration state process or element and provide to theVMM 530 via host driver 508. In this manner, the I/O command may bemigrated to the target physical machine in the AI migration stateprocess or element. On the target physical machine, host driver 508 mayre-issue saved I/O commands to new AIs 564 a-c as part of the AI restoreprocess. Host driver 508 may resume AI operation while VMM 530 mayresume VCPUs of VM 515, and physical device 550 may send PRS requests toVMM 530.

In various embodiments in which physical device 550 supports PRS and/orSVM, physical device 550 may send PRS requests for an AI 564 a-n to VMM530 even when the AI suspend process has started. VMM 530 may determineif VMM 530 should handle the PRS request or forward the PRS request toVM 515. In various embodiments, VMM 530 may determine whether to handlethe PRS request using the PASID and address in the PRS and walking thefirst-level page table. For example, if nested translation is not set upfor this PASID, or if the fault is not in the first-level page table,VMM 530 may handle the PRS request and return a response. In anotherexample, if nested translation is set for the PASID and the fault is inthe first-level page table, the PRS request may be forwarded orotherwise provided to VM 515 for further handling.

In various embodiments, if the PRS request is provided to VM 515 forfurther handling, VMM 530 may unmap fast path registers of assigneddevices of VM 515 so that any fast path access (for instance, includingwork submissions) from VM 515 may cause a VM exit to VDCM instead ofgoing directly to the physical device 550. The VCPU that the PRSinterrupt is directed to may be resumed. In some embodiments, VMM 530may also resume other VCPUS, either immediately and/or if/when therunning VCPU sends an IPI (inter processor interrupt) to another VCPU.If VM 515 exits due to work submissions occurring before the AI suspendprocess completes, work commands to the AI may be queued within theVDCM. Once the AI suspend process completes, VCPUS may be suspended (orsuspended again). The queued work commands may be included as part ofthe AI migration state element sent to the target physical machine.

In a conventional SR-IOV architecture, each VF may be allocateddevice-specific backend resources (for instance, backend resources 272a-n in FIG. 2 ). VF MMIO space is mapped to the backend resources forthe VF to function. As the VF driver directly accesses the VF MMIO spacewithout VMM involvement, an SR-IOV device must implement mapping fromthe VF MMIO space to the backend resources. Certain SR-IOV devices mayavoid static mapping between VFs and backend resources. Such SR-IOVdevices may implement complex re-mapping logic (for instance, deviceresource remapping logic 270 of FIG. 2 ) so that the VMM may dynamicallymap or allocate backend resources to VFs, for example, just beforeassigning VFs to VMs. However, implementing such resource managementlogic in an I/O device increases the cost and complexity of the SR-IOVdevice. Accordingly, typical I/O devices avoid implementing fine-grainedresource management on the I/O device in order to reduce cost andcomplexity of the device. For example, an I/O device may operate bystatically allocating backend resources, such as queues, to a VF in setsof a certain size (such as sets of 4) instead of at one queuegranularity. In another example, an I/O device may not allow changing VFresources if the VF is reassigned to a different VM.

A VMM cannot overcommit backend resources to VMs because VF drivers areable to directly access the VF MMIO space without VMM involvement.Accordingly, if VF1 is assigned to VM1, VF1's backend resources cannotbe reassigned to VM2 while VM1 is still running. However, it may bedesirable to reassign VF1 to VM2, for instance, based on VM resourcerequirements. Non-limiting examples of resource requirements may includeidle status, process requirements (for instance, certainhigh-performance processes, such as high-performance VF processes),resource requirements, resource priority, and/or the like.

For example, it may be desirable to reassign VF1 to VM2 if VM1 is idle,doesn't need certain processes, such as high-performance VF processes,has a lower resource requirement than VM2, and/or the like. Accordingly,the inability to overcommit an SR-IOV device to VMs severely reduces theflexibility and efficiency of device resource management in conventionalSR-IOV architectures.

With existing SR-IOV devices, over-committing memory of VMs withassigned devices is substantially limited, for instance, as conventionalapproaches of memory over-commit do not work for certain processes, suchas DMA accesses (for example, that assumes pinned memory) to VMs. Suchillustrative conventional approaches may include copy-on-write or VMMpaging techniques. Existing solutions may either disable memoryovercommit for VMs with assigned devices or attempt to operate usingalternate approaches (for instance, guest memory ballooning) that arenot applicable or inefficient for hyper-scale and/or high-densityusages.

Certain SR-IOV devices, for instance SVM-capable devices, may supportrecoverable I/O page faults for DMA through the use of PCIe standarddefined PRS capability and suitable DMA remapping hardware support.However, even when a device supports PRS, conventional implementationsof PRS do not require the device to support recoverable page faults onall DMA access. Rather, the device and the corresponding driver maytogether decide which device DMA accesses can recover from page-faults.Accordingly, using PRS capabilities transparently by the VMM (forinstance, without device-specific involvement) for demand paging isimpractical if not impossible for conventional PRS-capable devices. Assuch, the inability to overcommit memory to VMs reduces memorymanagement flexibility in VMMs for existing SR-IOV architecture systems.

In conventional SR-IOV systems, software architecture allows for asingle VDEV to include a single VF as VF MMIO space is designed tosupport a single VDEV and VF drivers directly accessing the VF MMIOspace. However, under certain usage conditions, it may be more efficientand effective for a VMM to assign multiple VFs from multiple SR-IOVdevices to the VM using a single VDEV. In addition, a VMM may achieveflexibility to compose “heterogeneous” VDEVs from multiple heterogeneousphysical devices. For example, a VM may require a certain number ofqueues for its VDEV, however the corresponding SR-IOV device cannotallocate the required number of queues to the VF because of a lack ofavailable queues on the device. In a “heterogeneous” system, forinstance, an independent hardware vendor (IHV) that develops both a MCand an FPGA device may seek to create a virtual MC with an on-board FPGAdevice for network acceleration usages instead of actually building sucha physical device. However, architecture and/or software of conventionalSR-IOV systems does not provide for the flexibility of creating a singleVDEV that consists of more than one VF, either from the same SR-IOVdevice or from multiple (and potentially heterogeneous) SR-IOV devices.

Accordingly, in some embodiments, an I/O device and/or VMMs may beconfigured such that the VMMs may emulate to provide resource managementfunctionalities that overcome the deficiencies of conventional SR-IOVsystems, such as mapping VDEVs directly to device backend resources,emulating MMIO interfaces to implement VMM memory overcommit,dynamically mapping VDEV paths (for instance, fast paths) betweenemulation and direct access to implement device overcommit, and/orcomposing a VDEV from AIs on multiple I/O devices.

FIG. 6 illustrates an example of an operating environment 600 such asmay be representative of some embodiments. More specifically, operatingenvironment 600 may include a virtualized computing environmentimplemented according to some embodiments, such as one of operatingenvironments 100, 200, 300, 400, and/or 500. In some embodiments,operating environment 600 may include an S-IOV architecture. In someembodiments, operating environment 600 may include a type-1 VMMarchitecture.

As shown in FIG. 6 , operating environment 600 may include a host 605having a host driver 608 and VMs 615 a-n having guest drivers 610 a-n.AN S-IOV device 630 may include a PF having a PF BAR space 650 and/or aPF configuration space 652. As opposed to the SR-IOV device 230 of FIG.2 , having VFs 232 a-n with VF BAR spaces 260 a-n and VF configurationspaces 262 a-n, S-IOV device 630 may include a PF 640 implementing AIMMIO (or command interface MMIO) spaces 670 a-n, for instance, as partof PF BARs 650. In some embodiments, VF BAR spaces 260 a-n and VFconfiguration spaces 262 a-n may be incorporated in VDCM 602, forexample, which may emulate VF BAR spaces 260 a-n and VF configurationspaces 262 a-n in software. For example, VDCM 602 may emulate VDEVs 604a-n which are each assigned to a VM 615 a-n. In various embodiments, AIs606 a-n may be assigned to a VM 615 a-n, for instance, by mapping AIs606 a-n into the VDEV 604 a-n MMIO space.

Guest drivers 610 a-n may access VDEV 604 a-n MMIO space to operateS-IOV device 630 functions. VDEV 604 a-n fast-path MMIO registers may bedirectly mapped to the AI MMIO 670 a-n registers, for instance, so thatguest drivers 610 a-n may directly access AI MMIOs 670 a-n without anyvirtualization overhead. VDEVs 604 a-n control path MMIO registers maybe unmapped so that guest drivers 610 a-n accesses of the control pathMMIO registers may generate a trap into VMM 620 (for instance, asrepresented by lines 680), which may be forwarded to VDCM 602 foremulation.

In some embodiments, the MMIO space of S-IOV device 630 may beconfigured such that each backend resource 672 a-n may be mapped to anAI MMIO 670 a-n, for instance, to remove the resource remapping logicfrom S-IOV device 630 (for example, device resource remapping logic 270of SR-IOV device 230). Each of AI MMIO 670 a-n may be mapped to abackend resource 672 a-n, such as a backend queue. For instance, ifS-IOV device 630 implements 2000 backend queues, the MMIO space of S-IOVdevice 630 may be configured to support 2000 AI MMIOs 670 a-n, eachmapping to a corresponding backend queue.

In various embodiments, implementation of VDEVs 604 a-n by VDCM 602 andoperating with each AI MMIO 670 a-n corresponding to a backend resource,resource mapping may be performed by VDCM 602. In some embodiments, VDCM602 may assign a desired number of backend resources 672 a-n to VMs 615a-n by allocating an equivalent number of AIs 606 a-n to VDEVs 604 a-n.Allocated AI MMIO 670 a-n fast path registers may be mapped directlyinto VMs 615 a-n using the CPUs second-level translation table, forinstance, EPTs 624 a-n. Accordingly, instead of using the resourceremapping logic from S-IOV device 630 (for example, device resourceremapping logic 270 of SR-IOV device 230), VDCM 602 may use EPTs 624 a-nto dynamically map desired backend resources 672 a-n to VMs 615 a-n. Inaddition, VMM 620 may have better control over managing S-IOV deviceresources, such as backend resources 672 a-n. For example, VMM 620 maynow assign device resources to VMs 615 a-n at much finer granularitywithout increasing device complexity. More specifically, as VDEV 602 isa software component, VMM 620 may dynamically create VDEVs 604 a-n witharbitrary resource management.

In some embodiments, a PRS-enabled S-IOV device 630 may includearchitecture, such as software architecture, that enables host driversto enumerate or indicate a page-fault status of an AI 606 a-n. In someembodiments, a page-fault status may include whether an AI 606 a-n ispage-fault capable, such as being fully page-fault capable, partiallypage-fault capable, not page-fault capable, or the like (for instance,if all DMA accesses from AI 606 a-n to system software managed memoryare capable of recoverable page-faults). If AIs 606 a-n are page-faultcapable, VMM 620 may support full memory over-commit for VMs 615 a-nwith virtual devices backed by page-fault capable AIs 606 a-n. Hostdriver 608 may enumerate the full page-fault handling capability usingone or more device-specific techniques according to some embodiments,such as using an MMIO capability register.

In various embodiments, AIs 606 a-n may only be partially page-faultcapable. For example, a portion of DMA accesses generated by AIs 606 a-nmay not be page-fault capable, while another portion may be page-faultcapable (for instance, a vast majority may be page-fault capable). Forpartially page-fault capable AIs 606 a-n, VDEV 604 a-n MMIO interfacemay be defined to require explicit indication/registration of such pagesby guest drivers 610 a-n. In this manner, VDCM 602 may be enabled tointercept and detect such “pinned” guest pages and ensure that they arealso pinned by VMM 620 in EPTs 624 a-n. An example of such usage mayinclude a graphics S-IOV device in which a small subset of pages mayneed to be pinned in memory and can be required to be explicitlyregistered, for instance, using a graphics translation table (forinstance, virtualized in software to reflect the pinning to VMMvirtualization page-tables), but the vast majority of the guest pagescan be demand-paged by the VMM as device accesses to such pages may besubject to recoverable page-faults.

In various embodiments, VMM 620 may overcommit AIs 606 a-n to VMs 615a-n using VDCM 602. For example, VDCM 602 may use EPTs 624 a-n to mapAIs 606 a-n to VMs 615 a-n. VDCM 602 may dynamically map/unmap AI MMIOs670 a-n in VMs 615 a-n. In this manner, VDCM 602 may be able todynamically assign/reassign (or “steal”) AIs 606 a-n from one VDEV 606a-n and allocate such AIs 606 a-n to another VDEV 606 a-n without VMs615 a-n knowledge. In various embodiments, a VDEV 604 a-n losing an AIs606 a-n (for instance, a “victim VDEV”) fast-path MMIO may bedynamically switched from direct-mapped to “trap-and-emulate” similar tothe control path MMIO of such a VDEV 604 a-n. Accordingly, AIs 606 a-nof victim VDEV 604 a-n may be fully emulated in software by VDCM 602. Inthis manner, VMM 620 may operate with the flexibility to overcommitS-IOV device 630 to VMs 615 a-n by dynamically allocating AIs to activeor higher-priority VMs and freeing AIs 606 a-n from inactive orlower-priority VMs.

In some embodiments, VDCM 602 may map multiple AIs 606 a-n from multipleS-IOV devices 630 to a VM 615 a-n as a single VDEV 604 a-n to generate aVDEV 604 a-n that is composed of AIs 606 a-n from multiple S-IOV devices630. In some embodiments, such multiple S-IOV devices 630 may includedevices that are the same make, model, and/or generation. Guest drivers610 a-n may access all AIs 606 a-n through VDEV 604 a-n MMIO space. Invarious embodiments, AIs 606 a-n may belong to the same type of S-IOVdevice (“homogeneous VDEV”). In various embodiments, AIs 606 a-n maybelong to different types of S-IOV device (“heterogeneous VDEV”). Insome embodiments, if usage requires AIs 606 a-n (belonging to the sameVDEV 604 a-n) to communicate directly to emulate a single VDEV 604 a-n,S-IOV devices 630 may implement the communication and provide requiredinterfaces for associated software. For example, guest driver 610 a-nmay configure an AI 606 a-n on one device to do DMA to another AI 606a-n of another S-IOV device 630 as part of VDEV I/O commands. Forexample, such a configuration may be used by VMM 620 to compose a VDEV604 a-n in which a total required number of MS 606 a-n may not beavailable on any single S-IOV device 630.

The ability to create heterogeneous virtual devices may provide IHVswith, among other things, the flexibility to reuse separate I/O devicesand/or to provide a single driver to a VM as if it were a heterogeneousphysical device. Component I/O devices may be designed to perform jointfunctionality on a heterogeneous device configured according to someembodiments. For instance, a virtual “NIC with on-board FPGA” mayinclude guest programs on the NIC AI to receive a network packet andFPGA AI may perform various operations on the packet. A guest driver mayconfigure or program the NIC AI to send the received packet to the FPGAAI according to some embodiments.

Included herein is a flow chart representative of exemplarymethodologies for performing novel aspects of the disclosedarchitecture. While, for purposes of simplicity of explanation, the oneor more methodologies shown herein, for example, in the form of a flowchart or flow diagram, are shown and described as a series of acts, itis to be understood and appreciated that the methodologies are notlimited by the order of acts, as some acts may, in accordance therewith,occur in a different order and/or concurrently with other acts from thatshown and described herein. For example, those skilled in the art willunderstand and appreciate that a methodology could alternatively berepresented as a series of interrelated states or events, such as in astate diagram. Moreover, not all acts illustrated in a methodology maybe required for a novel implementation

FIG. 7 depicts an illustrative logic flow according to a firstembodiment. More specifically, FIG. 7 illustrates one embodiment of alogic flow 700. Logic flow 700 may be representative of some or all ofthe operations executed by one or more embodiments described herein forlive VM migration. For example, logic flow 700 may illustrate operationsperformed by operating environments 100, 200, 300, 400, and/or 500.

In the illustrated embodiment shown in FIG. 7 , logic flow 700 mayinitiate VM migration at block 710. For example, live migration of VM515 from a source physical machine to a target physical machine(destination device) may be initiated within operating environment 500.At block 712, logic flow 700 may enable EPT A/D bits. For instance, EPTA/D bits may be enabled for all VCPUs of VM 515 that is being migrated.In some embodiments, if the EPT A/D bits are already enabled, the D-bitsmay be reset. At block 714, logic flow 700 may enable IOMMU A/D bits.For instance, IOMMU A/D bits may be enabled for all VCPUs of VM 515 thatis being migrated. In some embodiments, if the IOMMU A/D bits arealready enabled, the D-bits may be reset. Logic flow 700 may copy VMmemory pages at block 716. For instance, while VCPUs and/or VDEVs of VM515 are active, a copy (for instance, a “snapshot” copy) of all VMmemory pages may be generated and copied to the migration destination,such as a target physical machine.

At block 720, logic flow 700 may copy modified memory pages. Forinstance, logic flow may find EPT modified memory pages at block 722 andfind IOMMU modified memory pages at block 724. In some embodiments, EPTmodified memory pages may include EPTs that have been modified. Invarious embodiments, IOMMU modified memory pages may include IOMMUmapped pages that have been modified. The modified memory pages may becopied and provided to the migration destination at block 726. Thememory pages may be converged at block 728. For example, convergence maybe determined based on one or more of an incremental size going below asize threshold, an amount of time a loop has been running going over atime threshold, a combination thereof, and/or the like. In someembodiments, logic flow may perform blocks 720, 722, and/or 724 in aloop, for instance, making incremental copies while VCPUs and/or VDEVsare active and until convergence at block 726 is complete.

Logic flow 700 may suspend the VM at block 730. For instance, logic flow700 may suspend VCPUs at block 732 and/or inform the VDCM to suspendVDEV instances at block 734. In some embodiments, suspending VDEVs atblock 734 may include the VDCM issuing an AI suspend command and waitingfor the AI suspend complete command for all AIs assigned to the VDEV. Inaddition, outstanding I/O commands may be handled using one or moreprocesses according to some embodiments. For example, handling of suchoutstanding I/O commands may be device specific and may be abstractedfrom VMM 530 using various interfaces, such as an AI suspend interface,an AI migration state interface, and/or an AI restore interface. Forexample, in some embodiments, physical device 550 may wait for alloutstanding I/O commands from VM 515 to complete during an AI suspendprocess. Accordingly, an AI migration state process may not containoutstanding I/O commands. In this manner, an AI restore processassociated with VM 515 may be simpler and more efficient. In anotherexample, physical device 550 may wait to complete only outstanding I/Ocommands that are already in a processing pipeline of physical device550 during an AI suspend process. For instance, physical device 550 mayinclude the queued I/O commands (for example, that are still waiting tobe processed) in an AI migration state process or element (for example,an information element). Accordingly, host driver 508 may save theoutstanding I/O commands as part of an AI migration state process orelement and migrate them to the target physical machine.

At optional block 736, logic flow may handle PRS requests according tosome embodiments. For example, when physical device 550 encounters anI/O page fault and needs to perform PRS for an AI in an AI suspendprocess, physical device 550 may save the execution state of thecorresponding I/O command as part of an AI migration state process orelement. In this manner, the I/O command may be migrated to the targetphysical machine in the AI migration state process or element. On thetarget physical machine, host driver 508 may re-issue saved I/O commandsto new AIs 564 a-c as part of the AI restore process. Host driver 508may resume AI operation while VMM 530 may resume VCPUs of VM 515, andphysical device 550 may send PRS requests to VMM 530. In anotherexample, physical device 550 may send PRS requests for an AI 564 a-n toVMM 530 even when the AI suspend process has started. VMM 530 maydetermine if VMM 530 should handle the PRS request or forward the PRSrequest to VM 515. In various embodiments, VMM 530 may determine whetherto handle the PRS request using the PASID and address in the PRS andwalking the first-level page table. For example, if nested translationis not set up for this PASID, or if the fault is not in the first-levelpage table, VMM 530 may handle the PRS request and return a response. Inanother example, if nested translation is set for the PASID and thefault is in the first-level page table, the PRS request may be forwardedor otherwise provided to VM 515 for further handling. In a furtherexample, if the PRS request is provided to VM 515 for further handling,VMM 530 may unmap fast path registers of assigned devices of VM 515 sothat any fast path access (for instance, including work submissions)from VM 515 may cause a VM exit to VDCM instead of going directly to thephysical device 550. The VCPU that the PRS interrupt is directed to maybe resumed. In some embodiments, VMM 530 may also resume other VCPUS,either immediately and/or if/when the running VCPU sends and IPI toanother VCPU. If VM 515 exits due to work submissions occurring beforethe AI suspend process completes, work commands within the VDCM may bequeued. Once the AI suspend process completes, VCPUS may be suspended(or suspended again). The queued work commands may be included as partof the AI migration state element send to the target physical machine.

Logic flow 700 may scan and clear memory bits at block 740. For example,logic flow 700 may scan and clear EPT bits at block 742. In someembodiments, scanning and clearing EPT bits at block 742 may includescanning and clearing D-bits in the EPT to find dirtied pages, forinstance, dirtied by CPUs since copying modified memory pages at block720. In another example, logic flow 700 may scan and clear IOMMU bits atblock 744. In some embodiments, scanning and clearing IOMMU bits atblock 744 may include scanning and clearing D-bits in the IOMMU to finddirtied memory pages, for instance, dirtied by I/O devices (for example,physical device 550) since copying modified memory pages at block 720.Logic flow may copy the dirty pages and provide them to the migrationdestination at block 746.

At block 750, logic flow 700 may migrate virtual states of the VM. Forexample, logic flow may save VCPU state at block 752. In someembodiments, saving VCPU state at block 752 may include saving VCPUstate information including any posted interrupts and/or VAPIC stateinformation. At block 754, logic flow 700 may save migration states. Forinstance, VDCM 502 may save migration state information as one or moreAI migration state elements that may include, for instance, all the AIsassigned to VM 515 using host driver 508. At block 756, logic flow 700may provide virtual states to migration destination. For example, allVCPU and VDEV state information, including AI migration state elements,may be provided to the migration destination along with, for instance,other non-performance critical VDEV state information.

Logic flow 700 may generate VM on migration destination at block 760.For instance, on the migration destination, a VMM may create a new VMand restore all memory states received from the migration source (sourcephysical machine). The VDCM on the migration destination may create aVDEV using the VDEV's virtual device state. The virtual registers may beinitialized with values of source VDEV's corresponding virtualregisters. In some embodiments, the VDCM may request that the hostdriver allocate the required AIs. The AI migration state informationfrom the source AIs may be restored to the allocated AIs by the hostdriver, for instance, using an AI restore interface, and mapped to themigrated VMs VDEV. In various embodiments, the AIs may then be enabledand the VM resumed to normal operation.

FIG. 8 illustrates an example of a storage medium 800. Storage medium800 may comprise an article of manufacture. In some examples, storagemedium 800 may include any non-transitory computer readable medium ormachine readable medium, such as an optical, magnetic or semiconductorstorage. Storage medium 800 may store various types of computerexecutable instructions, such as instructions to implement logic flow800. Examples of a computer readable or machine readable storage mediummay include any tangible media capable of storing electronic data,including volatile memory or non-volatile memory, removable ornon-removable memory, erasable or non-erasable memory, writeable orre-writeable memory, and so forth. Examples of computer executableinstructions may include any suitable type of code, such as source code,compiled code, interpreted code, executable code, static code, dynamiccode, object-oriented code, visual code, and the like. The examples arenot limited in this context.

FIG. 9 illustrates an example computing platform 900. In some examples,as shown in FIG. 9 , computing platform 900 may include a processingcomponent 940, other platform components or a communications interface960. According to some examples, computing platform 900 may beimplemented in a computing device such as a server in a system such as adata center. Embodiments are not limited in this context.

According to some examples, processing component 940 may executeprocessing operations or logic for system 305 and/or storage medium 800.Processing component 940 may include various hardware elements, softwareelements, or a combination of both. Examples of hardware elements mayinclude devices, logic devices, components, processors, microprocessors,circuits, processor circuits, circuit elements (e.g., transistors,resistors, capacitors, inductors, and so forth), integrated circuits,application specific integrated circuits (ASIC), programmable logicdevices (PLD), digital signal processors (DSP), field programmable gatearrays (FPGAs), memory units, logic gates, registers, semiconductordevices, chips, microchips, chip sets, and so forth. Examples ofsoftware elements may include software components, programs,applications, computer programs, application programs, device drivers,system programs, software development programs, machine programs,operating system software, middleware, firmware, software modules,routines, subroutines, functions, methods, procedures, softwareinterfaces, application program interfaces (API), instruction sets,computing code, computer code, code segments, computer code segments,words, values, symbols, or any combination thereof. Determining whetheran example is implemented using hardware elements and/or softwareelements may vary in accordance with any number of factors, such asdesired computational rate, power levels, heat tolerances, processingcycle budget, input data rates, output data rates, memory resources,data bus speeds and other design or performance constraints, as desiredfor a given example.

In some examples, other platform components 950 may include commoncomputing elements, such as one or more processors, multi-coreprocessors, co-processors, memory units, chipsets, controllers,peripherals, interfaces, oscillators, timing devices, video cards, audiocards, multimedia input/output (I/O) components (e.g., digitaldisplays), power supplies, and so forth. Examples of memory units mayinclude without limitation various types of computer readable andmachine readable storage media in the form of one or more higher speedmemory units, such as read-only memory (ROM), random-access memory(RAM), dynamic RAM (DRAM), doubledata rate DRAM (DDRAM), synchronousDRAM (SDRAM), static RAM (SRAM), programmable ROM (PROM), erasableprogrammable ROM (EPROM), electrically erasable programmable ROM(EEPROM), flash memory, polymer memory such as ferroelectric polymermemory, ovonic memory, phase change or ferroelectric memory,silicon-oxide-nitride-oxide-silicon (SONOS) memory, magnetic or opticalcards, an array of devices such as redundant array of independent disks(RAID) drives, solid state memory devices (e.g., USB memory), solidstate drives (SSD) and any other type of storage media suitable forstoring information.

In some examples, communications interface 960 may include logic and/orfeatures to support a communication interface. For these examples,communications interface 960 may include one or more communicationinterfaces that operate according to various communication protocols orstandards to communicate over direct or network communication links.Direct communications may occur via use of communication protocols orstandards described in one or more industry standards (includingprogenies and variants) such as those associated with the PCI Expressspecification. Network communications may occur via use of communicationprotocols or standards such those described in one or more Ethernetstandards promulgated by the Institute of Electrical and ElectronicsEngineers (IEEE). For example, one such Ethernet standard may includeIEEE 802.3-2012, carrier sense multiple access with collision detection(CSMA/CD) access method and physical layer specifications, Published inDecember 2012 (hereinafter “IEEE 802.3”). Network communication may alsooccur according to one or more OpenFlow specifications such as theOpenFlow Hardware Abstraction API Specification. Network communicationsmay also occur according to Infiniband Architecture Specification,Volume 1, Release 1.3, published in March 2015 (“the InfinibandArchitecture Specification”).

Computing platform 900 may be part of a computing device that may be,for example, a server, a server array or server farm, a web server, anetwork server, an Internet server, a work station, a mini-computer, amain frame computer, a supercomputer, a network appliance, a webappliance, a distributed computing system, multiprocessor systems,processor-based systems, or combination thereof. Accordingly, functionsand/or specific configurations of computing platform 800 describedherein, may be included or omitted in various embodiments of computingplatform 800, as suitably desired.

The components and features of computing platform 800 may be implementedusing any combination of discrete circuitry, ASICs, logic gates and/orsingle chip architectures. Further, the features of computing platform900 may be implemented using microcontrollers, programmable logic arraysand/or microprocessors or any combination of the foregoing wheresuitably appropriate. It is noted that hardware, firmware and/orsoftware elements may be collectively or individually referred to hereinas “logic” or “circuit.”

It should be appreciated that the exemplary computing platform 900 shownin the block diagram of FIG. 9 may represent one functionallydescriptive example of many potential implementations. Accordingly,division, omission or inclusion of block functions depicted in theaccompanying figures does not infer that the hardware components,circuits, software and/or elements for implementing these functionswould necessarily be divided, omitted, or included in embodiments.

One or more aspects of at least one embodiment may be implemented byrepresentative instructions stored on a machine-readable medium whichrepresents various logic within the processor, which when read by amachine causes the machine to fabricate logic to perform the techniquesdescribed herein. Such representations, known as “IP cores” may bestored on a tangible, machine readable medium and supplied to variouscustomers or manufacturing facilities to load into the fabricationmachines that actually make the logic or processor. Some embodiments maybe implemented, for example, using a machine-readable medium or articlewhich may store an instruction or a set of instructions that, ifexecuted by a machine, may cause the machine to perform a method and/oroperations in accordance with the embodiments. Such a machine mayinclude, for example, any suitable processing platform, computingplatform, computing device, processing device, computing system,processing system, computer, processor, or the like, and may beimplemented using any suitable combination of hardware and/or software.The machine-readable medium or article may include, for example, anysuitable type of memory unit, memory device, memory article, memorymedium, storage device, storage article, storage medium and/or storageunit, for example, memory, removable or non-removable media, erasable ornon-erasable media, writeable or re-writeable media, digital or analogmedia, hard disk, floppy disk, compact disk read only memory (CD-ROM),compact disk recordable (CD-R), compact disk rewriteable (CD-RW),optical disk, magnetic media, magneto-optical media, removable memorycards or disks, various types of digital versatile disk (DVD), a tape, acassette, or the like. The instructions may include any suitable type ofcode, such as source code, compiled code, interpreted code, executablecode, static code, dynamic code, encrypted code, and the like,implemented using any suitable high-level, low-level, object-oriented,visual, compiled and/or interpreted programming language.

The following are non-limiting examples according to some embodiments:

Example 1 is an apparatus, comprising at least one memory, at least oneprocessor, and logic to transfer a virtual machine (VM), at least aportion of the logic comprised in hardware coupled to the at least onememory and the at least one processor, the logic to generate a pluralityof virtualized capability registers for a virtual device (VDEV) byvirtualizing a plurality of device-specific capability registers of aphysical device to be virtualized by a virtual machine monitor (VMM),the plurality of virtualized capability registers comprising a pluralityof device-specific capabilities of the physical device, determine atleast one version of the physical device to support via the VMM, andexpose at least a subset of the virtualized capability registersassociated with the at least one version to the VM.

Example 2 is the apparatus of Example 1, the logic to access thedevice-specific capability registers via a physical function (PF) driverof the physical device.

Example 3 is the apparatus of Example 1, the logic to operate a hostdriver, the host driver operative to access the device-specificcapability registers via a physical function (PF) driver of the physicaldevice.

Example 4 is the apparatus of Example 1, the logic to assign at leastone command interface to the VM by mapping the at least one commandinterface into a memory-mapped I/O (MMIO) space of the VDEV.

Example 5 is the apparatus of Example 1, the logic to assign at leastone command interface to the VM by mapping the at least one commandinterface into a memory-mapped I/O (MMIO) space of the VDEV, the atleast one command interface comprising at least one assignable interface(AI) of a scalable input/output virtualization (S-IOV) architecture.

Example 6 is the apparatus of Example 1, the logic to migrate the VM toa computing device having the physical device compatible with the atleast one version.

Example 7 is the apparatus of Example 1, the logic to expose at least asubset of the virtualized capability registers associated with theversion to a guest driver of the VM.

Example 8 is the apparatus of Example 1, the logic to, via the VMM,determine a compatibility between a guest driver associated with the VMand a target physical device.

Example 9 is the apparatus of Example 1, the device-specific capabilityregisters comprising memory-mapped I/O (MMIO) registers.

Example 10 is the apparatus of Example 1, the physical device comprisingat least one input/output (I/O) device.

Example 11 is the apparatus of Example 1, comprising an input/output(I/O) memory management unit (IOMMU) to provide accessed and dirty (A/D)bit support for the IOMMU.

Example 12 is the apparatus of Example 1, the logic to detect dirtymemory pages associated with the VM during VM migration using accessedand dirty (A/D) bit support of virtualization page-tables.

Example 13 is the apparatus of Example 1, the VDEV operative within avirtual device composition module (VDCM) of a host operating system(OS).

Example 14 is the apparatus of Example 1, the logic to copy modifiedmemory pages of VM modified during migration of VM, and provide themodified memory pages to a migration destination for the VM.

Example 15 is the apparatus of Example 1, the logic to suspend the VMduring migration of the VM to a migration destination, and clearmodified bits in modified memory pages associated with VM responsive tosuspension of the VM.

Example 16 is the apparatus of Example 1, the logic to suspend the VMduring migration of the VM to a migration destination, and clearmodified bits in modified memory pages associated with the VM responsiveto suspension of the VM, the modified memory pages comprising at leastone of second-level translation tables or input/output (I/O) memorymanagement unit (IOMMU) mapped pages.

Example 17 is the apparatus of Example 1, the logic to save migrationstate information in a migration state element.

Example 18 is the apparatus of Example 1, the logic to save migrationstate information in a migration state element, the migration stateelement comprising a command interface element comprising a migrationstate for at least one command interface assigned to the VM.

Example 19 is the apparatus of Example 1, the logic to provide thedevice-specific capabilities of the physical device to a guest driverassociated with the physical device via providing access to the at leasta subset of the virtualized capability registers to the guest driver.

Example 20 is the apparatus of Example 1, the logic to communicate withthe physical device via at least one command interface, the at least onecommand interface comprising at least one assignable interface.

Example 21 is the apparatus of Example 1, the logic to provide a commandinterface suspend request message to request the physical device tosuspend a command interface prior to a migration of the VM.

Example 22 is the apparatus of Example 1, the logic to provide a commandinterface restore request message to request the physical device torestore a command interface after a migration of the VM.

Example 23 is the apparatus of Example 1, the logic to receive a commandinterface migration state element from the physical device, the commandinterface migration state comprising command interface state informationto resume a command interface after migration of the VM.

Example 24 is the apparatus of Example 1, the logic to receive a commandinterface migration state element from the physical device, the commandinterface migration state comprising command interface state informationto resume a command interface after migration of the VM, the commandinterface state information comprising queued input/output (I/O)commands.

Example 25 is the apparatus of Example 1, the logic to determine thatthe physical device handles page request service (PRS) requests, receivean execution state of an input/output (I/O) command from the physicaldevice, the I/O command associated with a PRS I/O page fault, andprovide the I/O command to a target physical machine during migration ofthe VM.

Example 26 is the apparatus of Example 1, the logic to receive, at theVMM, a page service request service (PRS) request from the physicaldevice during migration of the VM.

Example 27 is the apparatus of Example 1, the logic to receive, at theVMM, a page request service (PRS) request from the physical deviceduring migration of the VM, and determine whether to handle the PRSrequest via the VMM or the VM.

Example 28 is the apparatus of Example 1, the logic to receive, at theVMM, a page request service (PRS) request from the physical deviceduring migration of the VM, and perform the PRS request, via the VMM,based on one of an address space associated with the VMM in the PRS or alocation of a fault of the PRS.

Example 29 is the apparatus of Example 1, the VM operative in a singleroot input/output virtualization (SR-IOV) architecture.

Example 30 is the apparatus of Example 1, the VM operative in a scalableinput/output virtualization (S-IOV) architecture.

Example 31 is a system to provide parallel decompression, comprising anapparatus according to any of Examples 1 to 33, and at least one radiofrequency (RF) transceiver.

Example 32 is a method to transfer a virtual machine (VM), the methodcomprising generating a plurality of virtualized capability registersfor a virtual device (VDEV) by virtualizing a plurality ofdevice-specific capability registers of a physical device to bevirtualized by the VMM, the plurality of virtualized capabilityregisters comprising a plurality of device-specific capabilities of thephysical device, determining at least one version of the physical deviceto support via a virtual machine monitor (VMM), and exposing a subset ofthe virtualized capability registers associated with the at least oneversion to the VM.

Example 33 is the method of Example 32, comprising accessing thedevice-specific capability registers via a physical function (PF) driverof the physical device.

Example 34 is the method of Example 32, comprising operating a hostdriver, the host driver operative to access the device-specificcapability registers via a physical function (PF) driver of the physicaldevice.

Example 35 is the method of Example 32, comprising assigning at leastone command interface to the VM by mapping the at least one commandinterface into a memory-mapped I/O (MMIO) space of the VDEV.

Example 36 is the method of Example 32, comprising assigning at leastone command interface to the VM by mapping the at least one commandinterface into a memory-mapped I/O (MMIO) space of the VDEV, the atleast one command interface comprising at least one assignable interface(AI) of a scalable input/output virtualization (S-IOV) architecture.

Example 37 is the method of Example 32, comprising migrating the VM to acomputing device having the physical device compatible with the at leastone version.

Example 38 is the method of Example 32, comprising exposing at least asubset of the virtualized capability registers associated with theversion to a guest driver of the VM.

Example 39 is the method of Example 32, comprising determining, via theVMM, a compatibility between a guest driver associated with the VM and atarget physical device.

Example 40 is the method of Example 32, the device-specific capabilityregisters comprising memory-mapped I/O (MMIO) registers.

Example 41 is the method of Example 32, the physical device comprisingat least one input/output (I/O) device.

Example 42 is the method of Example 32, comprising providing accessedand dirty (A/D) bit support for an input/output (I/O) memory managementunit (IOMMU).

Example 43 is the method of Example 32, comprising detecting dirtymemory pages associated with the VM during VM migration using accessedand dirty (A/D) bit support of virtualization page-tables.

Example 44 is the method of Example 32, the VDEV operative within avirtual device composition module (VDCM) of a host operating system(OS).

Example 45 is the method of Example 32, comprising copying modifiedmemory pages of VM modified during migration of VM, and providing themodified memory pages to a migration destination for the VM.

Example 46 is the method of Example 32, comprising suspending the VMduring migration of the VM to a migration destination, and clearingmodified bits in modified memory pages associated with the VM responsiveto suspension of the VM.

Example 47 is the method of Example 32, comprising suspending the VMduring migration of the VM to a migration destination, and clearingmodified bits in modified memory pages associated with the VM responsiveto suspension of the VM, the modified memory pages comprising at leastone of second-level translation tables or input/output (I/O) memorymanagement unit (IOMMU) mapped pages.

Example 48 is the method of Example 32, comprising saving migrationstate information in a migration state element.

Example 49 is the method of Example 32, comprising saving migrationstate information in a migration state element, the migration stateelement comprising a command interface element comprising a migrationstate for at least one command interface assigned to the VM.

Example 50 is the method of Example 32, comprising providing thedevice-specific capabilities of the physical device to a guest driverassociated with the physical device via providing access to the at leasta subset of the virtualized capability registers to the guest driver.

Example 51 is the method of Example 32, comprising communicating withthe physical device via at least one command interface, the at least onecommand interface comprising at least one assignable interface.

Example 52 is the method of Example 32, comprising providing a commandinterface suspend request message to request the physical device tosuspend a command interface prior to a migration of the VM.

Example 53 is the method of Example 32, comprising providing a commandinterface restore request message to request the physical device torestore a command interface after a migration of the VM.

Example 54 is the method of Example 32, comprising receiving a commandinterface migration state element from the physical device, the commandinterface migration state comprising command interface state informationto resume a command interface after migration of the VM.

Example 55 is the method of Example 32, comprising receiving a commandinterface migration state element from the physical device, the commandinterface migration state comprising command interface state informationto resume a command interface after migration of the VM, the commandinterface state information comprising queued input/output (I/O)commands.

Example 56 is the method of Example 32, comprising determining that thephysical device handles page request service (PRS) requests, receivingan execution state of an input/output (I/O) command from the physicaldevice, the I/O command associated with a PRS I/O page fault, andproviding the I/O command to a target physical machine during migrationof the VM.

Example 57 is the method of Example 32, comprising receiving, at theVMM, page request service (PRS) requests from the physical device duringmigration of the VM.

Example 58 is the method of Example 32, comprising receiving, at theVMM, page request service (PRS) requests from the physical device duringmigration of the VM, and determining whether to handle the PRS requestsvia the VMM or the VM.

Example 59 is the method of Example 32, comprising receiving, at theVMM, a page request service (PRS) request from the physical deviceduring migration of the VM, and performing the PRS request at the VMMbased on one of an address space associated with the VMM in the PRS or alocation of a fault of the PRS.

Example 60 is a computer-readable storage medium that storesinstructions for execution by processing circuitry of a computing deviceto transfer a virtual machine (VM), the instructions to cause thecomputing device to generate a plurality of virtualized capabilityregisters for a virtual device (VDEV) by virtualizing a plurality ofdevice-specific capability registers of a physical device to bevirtualized by the VMM, the plurality of virtualized capabilityregisters comprising a plurality of device-specific capabilities of thephysical device, determine at least one version of the physical deviceto support via a virtual machine monitor (VMM), and expose a subset ofthe virtualized capability registers associated with the at least oneversion to the VM.

Example 61 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to access thedevice-specific capability registers via a physical function (PF) driverof the physical device.

Example 62 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to operate a hostdriver, the host driver operative to access the device-specificcapability registers via a physical function (PF) driver of the physicaldevice.

Example 63 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to assign at leastone command interface to the VM by mapping the at least one commandinterface into a memory-mapped I/O (MMIO) space of the VDEV.

Example 64 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to assign at leastone command interface to the VM by mapping the at least one commandinterface into a memory-mapped I/O (MMIO) space of the VDEV, the atleast one command interface comprising at least one assignable interface(AI) of a scalable input/output virtualization (S-IOV) architecture.

Example 65 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to migrate the VMto a computing device having the physical device compatible with the atleast one version.

Example 66 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to expose at leasta subset of the virtualized capability registers associated with theversion to a guest driver of the VM.

Example 67 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to determine, viathe VMM, a compatibility between a guest driver associated with the VMand a target physical device.

Example 68 is the computer-readable storage medium of Example 60, thedevice-specific capability registers comprising memory-mapped I/O (MMIO)registers.

Example 69 is the computer-readable storage medium of Example 60, thephysical device comprising at least one input/output (I/O) device.

Example 70 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to provideaccessed and dirty (A/D) bit support for an input/output (I/O) memorymanagement unit (IOMMU).

Example 71 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to detect dirtymemory pages associated with the VM during VM migration using accessedand dirty (A/D) bit support of virtualization page-tables.

Example 72 is the computer-readable storage medium of Example 60, theVDEV operative within a virtual device composition module (VDCM) of ahost operating system (OS).

Example 73 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to copy modifiedmemory pages of the VM modified during migration of the VM, and providethe modified memory pages to a migration destination for the VM.

Example 74 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to suspend the VMduring migration of the VM to a migration destination, and clearmodified bits in modified memory pages associated with the VM responsiveto suspension of the VM.

Example 75 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to suspend the VMduring migration of the VM to a migration destination, and clearmodified bits in modified memory pages associated with the VM responsiveto suspension of the VM, the modified memory pages comprising at leastone of second-level translation tables or input/output (I/O) memorymanagement unit (IOMMU) mapped pages.

Example 76 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to save migrationstate information in a migration state element.

Example 77 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to save migrationstate information in a migration state element, the migration stateelement comprising a command interface element comprising a migrationstate for at least one command interface assigned to the VM.

Example 78 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to provide thedevice-specific capabilities of the physical device to a guest driverassociated with the physical device via providing access to the at leasta subset of the virtualized capability registers to the guest driver.

Example 79 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to communicatewith the physical device via at least one command interface, the atleast one command interface comprising at least one assignableinterface.

Example 80 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to provide acommand interface suspend request message to request the physical deviceto suspend a command interface prior to a migration of the VM.

Example 81 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to provide acommand interface restore request message to request the physical deviceto restore a command interface after a migration of the VM.

Example 82 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to receive acommand interface migration state element from the physical device, thecommand interface migration state comprising command interface stateinformation to resume a command interface after migration of the VM.

Example 83 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to receive acommand interface migration state element from the physical device, thecommand interface migration state comprising command interface stateinformation to resume a command interface after migration of the VM, thecommand interface state information comprising queued input/output (I/O)commands.

Example 84 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to determine thatthe physical device handles page request service (PRS) requests, receivean execution state of an input/output (I/O) command from the physicaldevice, the I/O command associated with a PRS I/O page fault, andprovide the I/O command to a target physical machine during migration ofthe VM.

Example 85 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to receive, at theVMM, page request service (PRS) requests from the physical device duringmigration of the VM.

Example 86 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to receive, at theVMM, page request service (PRS) requests from the physical device duringmigration of the VM, and determine whether to handle the PRS requestsvia the VMM or the VM.

Example 87 is the computer-readable storage medium of Example 60,comprising instructions to cause the computing device to receive, at theVMM, a page request service (PRS) request from the physical deviceduring migration of the VM, and perform the PRS request at the VMM basedon one of an address space associated with the VMM in the PRS or alocation of a fault of the PRS.

Example 88 is an apparatus, comprising a device version means todetermine at least one version of a physical device to support via avirtual machine manager VMM means, and a virtual capability registermeans to generate a plurality of virtualized capability registers for avirtual device (VDEV) by virtualizing a plurality of device-specificcapability registers of the physical device to be virtualized by the VMMmeans, the plurality of virtualized capability registers comprising aplurality of device-specific capabilities of the physical device, andexpose at least a subset of the virtualized capability registersassociated with the at least one version to the VM.

Example 89 is the apparatus of Example 88, the virtual capabilityregister means to access the device-specific capability registers via aphysical function (PF) driver of the physical device.

Example 90 is the apparatus of Example 88, a host means to operate ahost driver, the host driver operative to access the device-specificcapability registers via a physical function (PF) driver of the physicaldevice.

Example 91 is the apparatus of Example 88, a command interface means toassign at least command interface to the VM by mapping the at least onecommand interface into a memory-mapped I/O (MMIO) space of the VDEV.

Example 92 is the apparatus of Example 88, a VM migration means tomigrate the VM to a computing device having the physical devicecompatible with the at least one version.

Example 93 is the apparatus of Example 88, the virtual capabilityregister means to expose at least a subset of the virtualized capabilityregisters associated with the version to a guest driver of the VM.

Example 94 is the apparatus of Example 88, the VMM means to determine acompatibility between a guest driver associated with the VM and a targetphysical device.

Example 95 is the apparatus of Example 88, the device-specificcapability registers comprising memory-mapped I/O (MMIO) registers.

Example 96 is the apparatus of Example 88, the physical devicecomprising at least one input/output (I/O) device.

Example 97 is the apparatus of Example 88, comprising an input/output(I/O) memory management unit (IOMMU) to provide accessed and dirty (A/D)bit support for the IOMMU.

Example 98 is the apparatus of Example 88, a memory monitoring means todetect dirty memory pages associated with the VM during VM migrationusing accessed and dirty (A/D) bit support of virtualizationpage-tables.

Example 99 is the apparatus of Example 88, the VDEV operative within avirtual device composition module (VDCM) of a host operating system(OS).

Example 100 is the apparatus of Example 88, a memory monitoring means tocopy modified memory pages of the VM modified during migration of theVM, and provide the modified memory pages to a migration destination forthe VM.

Example 101 is the apparatus of Example 88, a VM migration means tosuspend the VM during migration of the VM to a migration destination,and clear modified bits in modified memory pages associated with the VMresponsive to suspension of the VM.

Example 102 is the apparatus of Example 88, a VM migration means tosuspend the VM during migration of the VM to a migration destination,and clear modified bits in modified memory pages associated with the VMresponsive to suspension of the VM, the modified memory pages comprisingat least one of second-level translation tables or input/output (I/O)memory management unit (IOMMU) mapped pages.

Example 103 is the apparatus of Example 88, a VM migration means to savemigration state information in a migration state element.

Example 104 is the apparatus of Example 88, a VM migration means to savemigration state information in a migration state element, the migrationstate element comprising a command interface element comprising amigration state for at least one command interface. assigned to the VM.

Example 105 is the apparatus of Example 88, a VM migration means toprovide the device-specific capabilities of the physical device to aguest driver associated with the physical device via providing access tothe at least a subset of the virtualized capability registers to theguest driver.

Example 106 is the apparatus of Example 88, the VMM means to communicatewith the physical device via at least one command interface, the atleast one command interface comprising at least one assignableinterface.

Example 107 is the apparatus of Example 88, the VMM means to provide acommand interface suspend request message to request the physical deviceto suspend a command interface prior to a migration of the VM.

Example 108 is the apparatus of Example 88, the VMM means to provide acommand interface restore request message to request the physical deviceto restore a command interface after a migration of the VM.

Example 109 is the apparatus of Example 88, the VMM means to receive acommand interface migration state element from the physical device, thecommand interface migration state comprising command interface stateinformation to resume a command interface after migration of the VM.

Example 110 is the apparatus of Example 88, the VMM means to receive acommand interface migration state element from the physical device, thecommand interface migration state comprising command interface stateinformation to resume a command interface after migration of the VM, thecommand interface state information comprising queued input/output (I/O)commands.

Example 111 is the apparatus of Example 88, the VMM means to determinethat the physical device handles page request service (PRS) requests,receive an execution state of an input/output (I/O) command from thephysical device, the I/O command associated with a PRS I/O page fault,and provide the I/O command to a target physical machine duringmigration of the VM.

Example 112 is the apparatus of Example 88, the VMM means to receive apage request service (PRS) request from the physical device duringmigration of the VM.

Example 113 is the apparatus of Example 88, the VMM means to receive apage request service (PRS) request from the physical device duringmigration of the VM, and determine whether to handle the PRS request viathe VMM means or the VM.

Example 114 is the apparatus of Example 88, the VMM means to receive apage request service (PRS) request from the physical device duringmigration of the VM, and perform the PRS request, via the VMM means,based on one of an address space associated with the VMM means in thePRS or a location of a fault of the PRS.

Example 115 is an apparatus, comprising at least one memory, at leastone processor, and logic to manage resources of at least one virtualmachine (VM) in a virtualized computing environment, at least a portionof the logic comprised in hardware coupled to the at least one memoryand the at least one processor, the logic to emulate at least onevirtual device (VDEV) via a virtual device composition module (VDCM),the VDEV comprising at least one command interface, map the at least onecommand interface to at least one command interface memory-mapped I/O(MMIO) register of at least one physical device, the at least onecommand interface MMIO register associated with at least one backendresource, and map the at least one command interface to the at least oneVM to provide the at least one VM access to the at least one backendresource.

Example 116 is the apparatus of Example 115, the at least one commandinterface comprising at least one assignable interface (AI) of ascalable input/output virtualization (S-IOV) architecture.

Example 117 is the apparatus of Example 115, the logic to map the atleast one command interface to the VM using a second-level translationtable of the at least one processor.

Example 118 is the apparatus of Example 115, the logic to provide avirtual machine manager (VMM), the VMM to generate the at least oneVDEV.

Example 119 is the apparatus of Example 115, the logic to indicate apage-fault status of the at least one command interface.

Example 120 is the apparatus of Example 115, the logic to indicate apage-fault status of the at least one command interface, the page-faultstatus to indicate whether the at least one command interface is one offully page-fault capable, partially page-fault capable, or notpage-fault capable.

Example 121 is the apparatus of Example 115, the logic to support memoryover-commit for the at least one VM responsive to determining that theat least one command interface is fully page-fault capable.

Example 122 is the apparatus of Example 115, the logic to support memoryover-commit for the at least one VM for at least one command interfacethat is fully page-fault capable via page-fault handling using at leastone MMIO capability register of a host driver.

Example 123 is the apparatus of Example 115, the logic to support memoryover-commit for the at least one VM for at least one command interfacethat is fully page-fault capable via page-fault handling using a hostdriver.

Example 124 is the apparatus of Example 115, the logic to receive anindication that the at least one command interface is partiallypage-fault capable, and determine pinned memory pages of a guest driverassociated with the at least one VM.

Example 125 is the apparatus of Example 115, the logic to receive anindication that the at least one command interface is partiallypage-fault capable, determine pinned memory pages of a guest driverassociated with the at least one VM, and pin the pinned memory pages ina second-level translation table used by a virtual memory manager (VMM)associated with the at least one VM.

Example 126 is the apparatus of Example 115, the logic to determine atleast one VM resource requirement for the at least one VM.

Example 127 is the apparatus of Example 115, the logic to re-map the atleast one command interface based on a VM resource requirement of the atleast one VM.

Example 128 is the apparatus of Example 115, the at least one VMcomprising a first VM and a second VM, and the at least one commandinterface comprising a first command interface mapped to the first VMand a second command interface mapped to the second VM, the logic tounmap the first command interface from the first VM, and map the firstcommand interface to the second VM.

Example 129 is the apparatus of Example 115, the at least one VMcomprising a first VM and a second VM, and the at least one commandinterface comprising a first command interface mapped to the first VMand a second command interface mapped to the second VM, the logic todetermine a VM resource requirement for the first VM and the second VM,re-map the first command interface to the second VM responsive to the VMresource requirement for the second VM being greater than for the firstVM.

Example 130 is the apparatus of Example 115, the at least one commandinterface comprising a plurality of at least one command interfaces, atleast a portion of the at least one command interfaces associated withat least one different physical device.

Example 131 is the apparatus of Example 115, the at least one commandinterface comprising a plurality of at least one command interfaces, atleast a portion of the at least one command interfaces associated with aplurality of different physical devices, each of the plurality ofdifferent physical devices comprising a same type of physical device.

Example 132 is the apparatus of Example 115, the at least one commandinterface comprising a plurality of at least one command interfaces, atleast a portion of the at least one command interfaces associated with aplurality of different physical devices, at least a portion of theplurality of different physical devices comprising a different type ofphysical device.

Example 133 is a system to provide parallel decompression, comprising anapparatus according to any of claims 115 to 132, and at least one radiofrequency (RF) transceiver.

Example 134 is a method to manage resources of at least one virtualmachine (VM) in a virtualized computing environment, comprisingemulating at least one virtual device (VDEV) via a virtual devicecomposition module (VDCM), the VDEV comprising at least one commandinterface, mapping the at least one command interface to at least onecommand interface memory-mapped I/O (MMIO) register of at least onephysical device, the at least one command interface MMIO registerassociated with at least one backend resource, and mapping the at leastone command interface to the VM to provide the VM access to the at leastone backend resource.

Example 135 is the method of Example 134, the at least one commandinterface comprising at least one assignable interface (AI) of ascalable input/output virtualization (S-IOV) architecture.

Example 136 is the method of Example 134, comprising mapping the atleast one command interface to the VM using a second-level translationtable of the at least one processor.

Example 137 is the method of Example 134, comprising providing a virtualmachine manager (VMM), the VMM to generate the at least one VDEV.

Example 138 is the method of Example 134, comprising indicating apage-fault status of the at least one command interface.

Example 139 is the method of Example 134, comprising indicating apage-fault status of the at least one command interface, the page-faultstatus to indicate whether the at least one command interface is one offully page-fault capable, partially page-fault capable, or notpage-fault capable.

Example 140 is the method of Example 134, comprising supporting memoryover-commit for the at least one VM responsive to determining that theat least one command interface is fully page-fault capable.

Example 141 is the method of Example 134, comprising supporting memoryover-commit for the at least one VM for at least one command interfacethat is fully page-fault capable via page-fault handling using at leastone MMIO capability register of a host driver.

Example 142 is the method of Example 134, comprising supporting memoryover-commit for the at least one VM for at least one command interfacethat is fully page-fault capable via page-fault handling using a hostdriver.

Example 143 is the method of Example 134, comprising receiving anindication that the at least one command interface is partiallypage-fault capable, and determining pinned memory pages of a guestdriver associated with the at least one VM.

Example 144 is the method of Example 134, comprising receiving anindication that the at least one command interface is partiallypage-fault capable, determining pinned memory pages of a guest driverassociated with the at least one VM, and pinning the pinned memory pagesin a second-level translation table used by a virtual memory manager(VMM) associated with the at least one VM.

Example 145 is the method of Example 134, comprising determining atleast one VM resource requirement for the at least one VM.

Example 146 is the method of Example 134, comprising re-mapping the atleast one command interface based on a VM resource requirement of the atleast one VM.

Example 147 is the method of Example 134, the at least one VM comprisinga first VM and a second VM, and the at least one command interfacecomprising a first command interface mapped to the first VM and a secondcommand interface mapped to the second VM, comprising unmapping thefirst command interface from the first VM, and mapping the first commandinterface to the second VM.

Example 148 is the method of Example 134, the at least one VM comprisinga first VM and a second VM, and the at least one command interfacecomprising a first command interface mapped to the first VM and a secondcommand interface mapped to the second VM, comprising determining a VMresource requirement for the first VM and the second VM, re-mapping thefirst command interface to the second VM responsive to the VM resourcerequirement for the second VM being greater than for the first VM.

Example 149 is the method of Example 134, the at least one commandinterface comprising a plurality of at least one command interfaces, atleast a portion of the at least one command interfaces associated withat least one different physical device.

Example 150 is the method of Example 134, the at least one commandinterface comprising a plurality of at least one command interfaces, atleast a portion of the at least one command interfaces associated with aplurality of different physical devices, each of the plurality ofdifferent physical devices comprising a same type of physical device.

Example 151 is the method of Example 134, the at least one commandinterface comprising a plurality of at least one command interfaces, atleast a portion of the at least one command interfaces associated with aplurality of different physical devices, at least a portion of theplurality of different physical devices comprising a different type ofphysical device.

Example 152 is a computer-readable storage medium that storesinstructions for execution by processing circuitry of a computing deviceto manage resources of at least one virtual machine (VM) in avirtualized computing environment, the instructions to cause thecomputing device to emulate at least one virtual device (VDEV) via avirtual device composition module (VDCM), the VDEV comprising at leastone command interface, mapping the at least one command interface to atleast one command interface memory-mapped I/O (MMIO) register of atleast one physical device, the at least one command interface MMIOregister associated with at least one backend resource, and mapping theat least one command interface to the VM to provide the VM access to theat least one backend resource.

Example 153 is the computer-readable storage medium of Example 152, theat least one command interface comprising at least one assignableinterface (AI) of a scalable input/output virtualization (S-IOV)architecture.

Example 154 is the computer-readable storage medium of Example 152, theinstructions to cause the computing device to map the at least onecommand interface to the VM using a second-level translation table ofthe at least one processor.

Example 155 is the computer-readable storage medium of Example 152, theinstructions to cause the computing device to provide a virtual machinemanager (VMM), the VMM to generate the at least one VDEV.

Example 156 is the computer-readable storage medium of Example 152, theinstructions to cause the computing device to indicate a page-faultstatus of the at least one command interface.

Example 157 is the computer-readable storage medium of Example 152, theinstructions to cause the computing device to indicate a page-faultstatus of the at least one command interface, the page-fault status toindicate whether the at least one command interface is one of fullypage-fault capable, partially page-fault capable, or not page-faultcapable.

Example 158 is the computer-readable storage medium of Example 152, theinstructions to cause the computing device to support memory over-commitfor the at least one VM responsive to determining that the at least onecommand interface is fully page-fault capable.

Example 159 is the computer-readable storage medium of Example 152, theinstructions to cause the computing device to support memory over-commitfor the at least one VM for at least one command interface that is fullypage-fault capable via page-fault handling using at least one MMIOcapability register of a host driver.

Example 160 is the computer-readable storage medium of Example 152, theinstructions to cause the computing device to support memory over-commitfor the at least one VM for at least one command interface that is fullypage-fault capable via page-fault handling using a host driver.

Example 161 is the computer-readable storage medium of Example 152, theinstructions to cause the computing device to receive an indication thatthe at least one command interface is partially page-fault capable, anddetermine pinned memory pages of a guest driver associated with the atleast one VM.

Example 162 is the computer-readable storage medium of Example 152, theinstructions to cause the computing device to receive an indication thatthe at least one command interface is partially page-fault capable,determine pinned memory pages of a guest driver associated with the atleast one VM, and pin the pinned memory pages in a second-leveltranslation table used by a virtual memory manager (VMM) associated withthe at least one VM.

Example 163 is the computer-readable storage medium of Example 152, theinstructions to cause the computing device to determine at least one VMresource requirement for the at least one VM.

Example 164 is the computer-readable storage medium of Example 152, theinstructions to cause the computing device to re-map the at least onecommand interface based on a VM resource requirement of the at least oneVM.

Example 165 is the computer-readable storage medium of Example 152, theat least one VM comprising a first VM and a second VM, and the at leastone command interface comprising a first command interface mapped to thefirst VM and a second command interface mapped to the second VM, theinstructions to cause the computing device to unmap the first commandinterface from the first VM, and map the first command interface to thesecond VM.

Example 166 is the computer-readable storage medium of Example 152, theat least one VM comprising a first VM and a second VM, and the at leastone command interface comprising a first command interface mapped to thefirst VM and a second command interface mapped to the second VM, theinstructions to cause the computing device to determine a VM resourcerequirement for the first VM and the second VM, re-map the first commandinterface to the second VM responsive to the VM resource requirement forthe second VM being greater than for the first VM.

Example 167 is the computer-readable storage medium of Example 152, theat least one command interface comprising a plurality of at least onecommand interfaces, at least a portion of the at least one commandinterfaces associated with at least one different physical device.

Example 168 is the computer-readable storage medium of Example 152, theat least one command interface comprising a plurality of at least onecommand interfaces, at least a portion of the at least one commandinterfaces associated with a plurality of different physical devices,each of the plurality of different physical devices comprising a sametype of physical device.

Example 169 is the computer-readable storage medium of Example 152, theat least one command interface comprising a plurality of at least onecommand interfaces, at least a portion of the at least one commandinterfaces associated with a plurality of different physical devices, atleast a portion of the plurality of different physical devicescomprising a different type of physical device.

Example 170 is an apparatus to manage resources of at least one virtualmachine (VM) in a virtualized computing environment, comprising avirtual device means to emulate at least one virtual device (VDEV) via avirtual device composition module (VDCM), the VDEV comprising at leastone command interface, and a mapping means to map the at least onecommand interface to at least one command interface memory-mapped I/O(MMIO) register of at least one physical device, the at least onecommand interface MMIO register associated with at least one backendresource, and map the at least one command interface to the at least oneVM to provide the at least one VM access to the at least one backendresource.

Example 171 is the apparatus of Example 170, the at least one commandinterface comprising at least one assignable interface (AI) of ascalable input/output virtualization (S-IOV) architecture.

Example 172 is the apparatus of Example 170, the mapping means to mapthe at least one command interface to the VM using a second-leveltranslation table of the at least one processor.

Example 173 is the apparatus of Example 170, a virtual machine manager(VMM) means to generate the at least one VDEV.

Example 174 is the apparatus of Example 170, a memory overcommit meansto indicate a page-fault status of the at least one command interface.

Example 175 is the apparatus of Example 170, a memory overcommit meansto indicate a page-fault status of the at least one command interface,the page-fault status to indicate whether the at least one commandinterface is one of fully page-fault capable, partially page-faultcapable, or not page-fault capable.

Example 176 is the apparatus of Example 170, the memory overcommit tosupport memory over-commit for the at least one VM responsive todetermining that the at least one command interface is fully page-faultcapable.

Example 177 is the apparatus of Example 170, the memory overcommit tosupport memory over-commit for the at least one VM for at least onecommand interface that is fully page-fault capable via page-faulthandling using at least one MMIO capability register of a host driver.

Example 178 is the apparatus of Example 170, the memory overcommit tosupport memory over-commit for the at least one VM for at least onecommand interface that is fully page-fault capable via page-faulthandling using a host driver.

Example 179 is the apparatus of Example 170, the memory overcommit toreceive an indication that the at least one command interface ispartially page-fault capable, and determine pinned memory pages of aguest driver associated with the at least one VM.

Example 180 is the apparatus of Example 170, the memory overcommit toreceive an indication that the at least one command interface ispartially page-fault capable, determine pinned memory pages of a guestdriver associated with the at least one VM, and pin the pinned memorypages in a second-level translation table used by a virtual memorymanager (VMM) associated with the at least one VM.

Example 181 is the apparatus of Example 170, a device overcommit meansto determine at least one VM resource requirement for the at least oneVM.

Example 182 is the apparatus of Example 170, a device overcommit meansto re-map the at least one command interface based on a VM resourcerequirement of the at least one VM.

Example 183 is the apparatus of Example 170, the at least one VMcomprising a first VM and a second VM, and the at least one commandinterface comprising a first command interface mapped to the first VMand a second command interface mapped to the second VM, comprising adevice overcommit means to unmap the first command interface from thefirst VM, and map the first command interface to the second VM.

Example 184 is the apparatus of Example 170, the at least one VMcomprising a first VM and a second VM, and the at least one commandinterface comprising a first command interface mapped to the first VMand a second command interface mapped to the second VM, comprising adevice overcommit means to determine a VM resource requirement for thefirst VM and the second VM, re-map the first command interface to thesecond VM responsive to the VM resource requirement for the second VMbeing greater than for the first VM.

Example 185 is the apparatus of Example 170, the at least one commandinterface comprising a plurality of at least one command interfaces, atleast a portion of the at least one command interfaces associated withat least one different physical device.

Example 186 is the apparatus of Example 170, the at least one commandinterface comprising a plurality of at least one command interfaces, atleast a portion of the at least one command interfaces associated with aplurality of different physical devices, each of the plurality ofdifferent physical devices comprising a same type of physical device.

Example 187 is the apparatus of Example 170, the at least one commandinterface comprising a plurality of at least one command interfaces, atleast a portion of the at least one command interfaces associated with aplurality of different physical devices, at least a portion of theplurality of different physical devices comprising a different type ofphysical device.

Example 188 is a system to provide parallel decompression, comprising anapparatus according to any of claims 170 to 187, and at least one radiofrequency (RF) transceiver.

Numerous specific details have been set forth herein to provide athorough understanding of the embodiments. It will be understood bythose skilled in the art, however, that the embodiments may be practicedwithout these specific details. In other instances, well-knownoperations, components, and circuits have not been described in detailso as not to obscure the embodiments. It can be appreciated that thespecific structural and functional details disclosed herein may berepresentative and do not necessarily limit the scope of theembodiments.

Some embodiments may be described using the expression “coupled” and“connected” along with their derivatives. These terms are not intendedas synonyms for each other. For example, some embodiments may bedescribed using the terms “connected” and/or “coupled” to indicate thattwo or more elements are in direct physical or electrical contact witheach other. The term “coupled,” however, may also mean that two or moreelements are not in direct contact with each other, but yet stillco-operate or interact with each other.

Unless specifically stated otherwise, it may be appreciated that termssuch as “processing,” “computing,” “calculating,” “determining,” or thelike, refer to the action and/or processes of a computer or computingsystem, or similar electronic computing device, that manipulates and/ortransforms data represented as physical quantities (e.g., electronic)within the computing system's registers and/or memories into other datasimilarly represented as physical quantities within the computingsystem's memories, registers or other such information storage,transmission or display devices. The embodiments are not limited in thiscontext.

It should be noted that the methods described herein do not have to beexecuted in the order described, or in any particular order. Moreover,various activities described with respect to the methods identifiedherein can be executed in serial or parallel fashion.

Although specific embodiments have been illustrated and describedherein, it should be appreciated that any arrangement calculated toachieve the same purpose may be substituted for the specific embodimentsshown. This disclosure is intended to cover any and all adaptations orvariations of various embodiments. It is to be understood that the abovedescription has been made in an illustrative fashion, and not arestrictive one. Combinations of the above embodiments, and otherembodiments not specifically described herein will be apparent to thoseof skill in the art upon reviewing the above description. Thus, thescope of various embodiments includes any other applications in whichthe above compositions, structures, and methods are used.

Although the subject matter has been described in language specific tostructural features and/or methodological acts, it is to be understoodthat the subject matter defined in the appended claims is notnecessarily limited to the specific features or acts described above.Rather, the specific features and acts described above are disclosed asexample forms of implementing the claims.

What is claimed is:
 1. An apparatus, comprising: an interface to memory;processor circuitry, the circuitry operable to execute one or moreinstructions to cause the circuitry to: generate a plurality ofvirtualized capability registers for a virtual device (VDEV) byvirtualizing a plurality of device-specific capability registers of aphysical device to be virtualized by a virtual machine monitor (VMM),the plurality of virtualized capability registers to define a pluralityof device-specific capabilities of the physical device; determine afirst version of a plurality of versions of the physical device tosupport via the VMM; expose a subset of the virtualized capabilityregisters associated with the first version of the physical device to avirtual machine (VM); suspend the VM during migration of the VM to amigration destination; and clear modified bits in modified memory pagesassociated with VM responsive to suspension of the VM, the modifiedmemory pages comprising at least one of second-level translation tablesor input/output memory management unit (IOMMU) mapped pages.
 2. Theapparatus of claim 1, the circuitry to access the device-specificcapability registers via a physical function (PF) driver of the physicaldevice.
 3. The apparatus of claim 1, the circuitry to assign at leastone command interface to the VM by mapping the at least one commandinterface into a memory-mapped input/output (MMIO) space of the VDEV. 4.The apparatus of claim 1, the device-specific capability registerscomprising memory-mapped input/output (MMIO) registers.
 5. The apparatusof claim 1, comprising an input/output (I/O) memory management unit(IOMMU) to provide accessed and dirty (A/D) bit support for the IOMMU.6. The apparatus of claim 1, the circuitry to detect dirty memory pagesassociated with the VM during VM migration using accessed and dirty(A/D) bit support of virtualization page-tables.
 7. The apparatus ofclaim 1, the circuitry to provide the device-specific capabilities ofthe physical device to a guest driver associated with the physicaldevice via providing access to the subset of the virtualized capabilityregisters to the guest driver.
 8. The apparatus of claim 1, thecircuitry to receive a command interface migration state element fromthe physical device, the command interface migration state comprisingcommand interface state information to resume a command interface aftermigration of the VM.
 9. The apparatus of claim 1, the circuitry to:receive, at the VMM, a page service request service (PRS) request fromthe physical device during migration of the VM, and determine whether tohandle the PRS request via the VMM or the VM.
 10. A method, comprising:generating a plurality of virtualized capability registers for a virtualdevice (VDEV) by virtualizing a plurality of device-specific capabilityregisters of a physical device to be virtualized by a virtual machinemonitor (VMM), the plurality of virtualized capability registerscomprising a plurality of device-specific capabilities of the physicaldevice; determining a first version of a plurality of versions of thephysical device to support via the VMM; exposing a subset of thevirtualized capability registers associated with the first version ofthe physical device to a virtual machine (VM); suspending the VM duringmigration of the VM to a migration destination; and clearing modifiedbits in modified memory pages associated with VM responsive tosuspension of the VM, the modified memory pages comprising at least oneof second-level translation tables or input/output memory managementunit (IOMMU) mapped pages.
 11. The method of claim 10, comprisingaccessing the device-specific capability registers via a physicalfunction (PF) driver of the physical device.
 12. The method of claim 10,comprising assigning at least one command interface to the VM by mappingthe at least one command interface into a memory-mapped input/output(MMIO) space of the VDEV.
 13. The method of claim 10, comprisingproviding accessed and dirty (A/D) bit support for an input/outputmemory management unit (IOMMU).
 14. The method of claim 10, comprising:receiving, at the VMM, page service request service (PRS) requests fromthe physical device during migration of the VM; and determining whetherto handle the PRS request via the VMM or the VM.
 15. An apparatus,comprising: an interface to memory; and a processor, the processorcomprising circuitry to: emulate at least one virtual device (VDEV) viaa virtual device composition module (VDCM), the VDEV comprising a firstcommand interface of a plurality of command interfaces; map the firstcommand interface to a first command interface memory-mappedinput/output (MMIO) register of a plurality of command interface MMIOregisters of at least one physical device, the first command interfaceMMIO register associated with a first backend resource of a plurality ofbackend resources; map the first command interface to a first virtualmachine (VM) to provide the first VM access to the first backendresource; determine a VM resource requirement for the first VM and asecond VM: and re-map the first command interface to the second VMresponsive to the VM resource requirement for the second VM beinggreater than the VM resource requirement for the first VM.
 16. Theapparatus of claim 15, the first command interface comprising anassignable interface (AI) of a scalable input/output virtualization(S-IOV) architecture.
 17. The apparatus of claim 15, the circuitry tomap the first command interface to the first VM using a second-leveltranslation table of the processor.
 18. The apparatus of claim 15, thecircuitry to indicate a page-fault status of the first commandinterface, the page-fault status to indicate whether the first commandinterface is one of fully page-fault capable, partially page-faultcapable, or not page-fault capable.
 19. The apparatus of claim 15, thecircuitry to support memory over-commit for the first VM responsive todetermining that the first command interface is fully page-faultcapable.
 20. The apparatus of claim 15, the circuitry to: receive anindication that the first command interface is partially page-faultcapable, and determine pinned memory pages of a guest driver associatedwith the first VM.
 21. The apparatus of claim 15, the circuitry tore-map the first command interface from the second VM to a third VM. 22.The apparatus of claim 15, wherein a second command interface of theplurality of command interfaces is to be mapped to the second VM. 23.The apparatus of claim 15, wherein a second command interface of theplurality of command interfaces is associated with at least onedifferent physical device.